A longitudinal analysis of the crime rate in London: to what extent socioeconomic factors can explain violence in London

In [825]:
# Loading the necessary libraries    
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats
import numpy as np
import seaborn as sns
import statsmodels.api as sm
import statistics 

Part 1: Investigating Crime Patterns in London

In [749]:
# Importing crime file (Rates of offences per thousand population)
cr_df = pd.read_excel('crime.xlsx', sheet_name = 'Crime Rates', header = [1])
In [750]:
# checking the head of data
cr_df.head()
Out[750]:
Code Borough 1999-00 2000-01 2001-02 2002-03 2003-04 2004-05 2005-06 2006-07 ... 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 E09000002 Barking and Dagenham 120.453818 123.574527 123.981310 122.591776 133.229048 129.472858 132.683807 127.927637 ... 169031.0 172452.0 177580.0 182838.0 187029.0 190560.0 194352.0 198294.0 201979.0 206460.0
2 E09000003 Barnet 97.961840 95.676158 101.020092 107.121465 112.764992 117.035243 106.463008 90.447127 ... 334837.0 339212.0 345829.0 351438.0 357538.0 363956.0 369088.0 374915.0 379691.0 386083.0
3 E09000004 Bexley 95.098824 96.343677 96.911185 90.141154 92.466002 85.366903 89.981159 76.124149 ... 224625.0 226652.0 228146.0 230711.0 232774.0 234271.0 236687.0 239865.0 242142.0 244760.0
4 E09000005 Brent 127.740409 122.716035 123.418144 124.270485 133.559181 132.602903 129.697090 110.211787 ... 283273.0 290901.0 298118.0 304785.0 312245.0 314660.0 317264.0 320762.0 324012.0 328254.0

5 rows × 200 columns

In [751]:
# Removing the area code columns as it is not required for our analysis. 
cr_df.drop('Code', axis=1, inplace=True)
In [752]:
# Checking for missing values 
cr_df.isnull().sum
Out[752]:
<bound method DataFrame.sum of     Borough  1999-00  2000-01  2001-02  2002-03  2003-04  2004-05  2005-06  \
0      True     True     True     True     True     True     True     True   
1     False    False    False    False    False    False    False    False   
2     False    False    False    False    False    False    False    False   
3     False    False    False    False    False    False    False    False   
4     False    False    False    False    False    False    False    False   
5     False    False    False    False    False    False    False    False   
6     False    False    False    False    False    False    False    False   
7     False    False    False    False    False    False    False    False   
8     False    False    False    False    False    False    False    False   
9     False    False    False    False    False    False    False    False   
10    False    False    False    False    False    False    False    False   
11    False    False    False    False    False    False    False    False   
12    False    False    False    False    False    False    False    False   
13    False    False    False    False    False    False    False    False   
14    False    False    False    False    False    False    False    False   
15    False    False    False    False    False    False    False    False   
16    False    False    False    False    False    False    False    False   
17    False    False    False    False    False    False    False    False   
18    False    False    False    False    False    False    False    False   
19    False    False    False    False    False    False    False    False   
20    False    False    False    False    False    False    False    False   
21    False    False    False    False    False    False    False    False   
22    False    False    False    False    False    False    False    False   
23    False    False    False    False    False    False    False    False   
24    False    False    False    False    False    False    False    False   
25    False    False    False    False    False    False    False    False   
26    False    False    False    False    False    False    False    False   
27    False    False    False    False    False    False    False    False   
28    False    False    False    False    False    False    False    False   
29    False    False    False    False    False    False    False    False   
30    False    False    False    False    False    False    False    False   
31    False    False    False    False    False    False    False    False   
32    False    False    False    False    False    False    False    False   
33    False     True     True     True     True     True     True     True   
34    False    False    False    False    False    False    False    False   
35    False    False    False    False    False    False    False    False   
36     True     True     True     True     True     True     True     True   
37    False    False    False    False    False    False    False    False   
38     True     True     True     True     True     True     True     True   
39    False     True     True     True    False    False    False    False   

    2006-07  2007-08  ...   2007   2008   2009   2010   2011   2012   2013  \
0      True     True  ...   True   True   True   True   True   True   True   
1     False    False  ...  False  False  False  False  False  False  False   
2     False    False  ...  False  False  False  False  False  False  False   
3     False    False  ...  False  False  False  False  False  False  False   
4     False    False  ...  False  False  False  False  False  False  False   
5     False    False  ...  False  False  False  False  False  False  False   
6     False    False  ...  False  False  False  False  False  False  False   
7     False    False  ...  False  False  False  False  False  False  False   
8     False    False  ...  False  False  False  False  False  False  False   
9     False    False  ...  False  False  False  False  False  False  False   
10    False    False  ...  False  False  False  False  False  False  False   
11    False    False  ...  False  False  False  False  False  False  False   
12    False    False  ...  False  False  False  False  False  False  False   
13    False    False  ...  False  False  False  False  False  False  False   
14    False    False  ...  False  False  False  False  False  False  False   
15    False    False  ...  False  False  False  False  False  False  False   
16    False    False  ...  False  False  False  False  False  False  False   
17    False    False  ...  False  False  False  False  False  False  False   
18    False    False  ...  False  False  False  False  False  False  False   
19    False    False  ...  False  False  False  False  False  False  False   
20    False    False  ...  False  False  False  False  False  False  False   
21    False    False  ...  False  False  False  False  False  False  False   
22    False    False  ...  False  False  False  False  False  False  False   
23    False    False  ...  False  False  False  False  False  False  False   
24    False    False  ...  False  False  False  False  False  False  False   
25    False    False  ...  False  False  False  False  False  False  False   
26    False    False  ...  False  False  False  False  False  False  False   
27    False    False  ...  False  False  False  False  False  False  False   
28    False    False  ...  False  False  False  False  False  False  False   
29    False    False  ...  False  False  False  False  False  False  False   
30    False    False  ...  False  False  False  False  False  False  False   
31    False    False  ...  False  False  False  False  False  False  False   
32    False    False  ...  False  False  False  False  False  False  False   
33     True     True  ...   True   True   True   True   True   True   True   
34    False    False  ...  False  False  False  False  False  False  False   
35    False    False  ...  False  False  False  False  False  False  False   
36     True     True  ...   True   True   True   True   True   True   True   
37    False    False  ...  False  False  False  False  False  False  False   
38     True     True  ...   True   True   True   True   True   True   True   
39    False    False  ...  False  False  False  False  False  False  False   

     2014   2015   2016  
0    True   True   True  
1   False  False  False  
2   False  False  False  
3   False  False  False  
4   False  False  False  
5   False  False  False  
6   False  False  False  
7   False  False  False  
8   False  False  False  
9   False  False  False  
10  False  False  False  
11  False  False  False  
12  False  False  False  
13  False  False  False  
14  False  False  False  
15  False  False  False  
16  False  False  False  
17  False  False  False  
18  False  False  False  
19  False  False  False  
20  False  False  False  
21  False  False  False  
22  False  False  False  
23  False  False  False  
24  False  False  False  
25  False  False  False  
26  False  False  False  
27  False  False  False  
28  False  False  False  
29  False  False  False  
30  False  False  False  
31  False  False  False  
32  False  False  False  
33   True   True   True  
34  False  False  False  
35  False  False  False  
36   True   True   True  
37  False  False  False  
38   True   True   True  
39  False  False  False  

[40 rows x 199 columns]>

Observations: The data frame does not contain any missing values in the rows that we need for our analysis. There are missing values that are result of loading csv file in Python. Those values will be removed

In [753]:
# Removing missing values
cr_df.dropna(inplace=True)
In [754]:
# Checking the head of data
cr_df.head(10)
Out[754]:
Borough 1999-00 2000-01 2001-02 2002-03 2003-04 2004-05 2005-06 2006-07 2007-08 ... 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
1 Barking and Dagenham 120.453818 123.574527 123.981310 122.591776 133.229048 129.472858 132.683807 127.927637 115.576433 ... 169031.0 172452.0 177580.0 182838.0 187029.0 190560.0 194352.0 198294.0 201979.0 206460.0
2 Barnet 97.961840 95.676158 101.020092 107.121465 112.764992 117.035243 106.463008 90.447127 79.576033 ... 334837.0 339212.0 345829.0 351438.0 357538.0 363956.0 369088.0 374915.0 379691.0 386083.0
3 Bexley 95.098824 96.343677 96.911185 90.141154 92.466002 85.366903 89.981159 76.124149 74.840289 ... 224625.0 226652.0 228146.0 230711.0 232774.0 234271.0 236687.0 239865.0 242142.0 244760.0
4 Brent 127.740409 122.716035 123.418144 124.270485 133.559181 132.602903 129.697090 110.211787 99.289378 ... 283273.0 290901.0 298118.0 304785.0 312245.0 314660.0 317264.0 320762.0 324012.0 328254.0
5 Bromley 89.772196 88.521826 96.678797 102.053474 104.012407 103.465441 102.841818 94.443503 83.254853 ... 302625.0 304968.0 306924.0 308560.0 310554.0 314036.0 317899.0 321278.0 324857.0 326889.0
6 Camden 241.048826 238.451579 262.150301 264.202930 249.767446 219.117300 200.087167 201.083248 167.350605 ... 211520.0 210273.0 212924.0 214725.0 220087.0 224962.0 229719.0 234846.0 241059.0 246181.0
7 Croydon 113.661140 105.376061 111.753682 115.134982 110.273608 103.828151 102.813138 92.554245 88.908203 ... 344029.0 349308.0 352763.0 357951.0 364815.0 368886.0 372752.0 376040.0 379031.0 382304.0
8 Ealing 128.161269 126.727338 127.338289 129.823892 123.096447 117.449762 119.239451 116.494201 106.577003 ... 318671.0 324022.0 329966.0 334073.0 339314.0 340671.0 342494.0 342118.0 343059.0 343196.0
9 Enfield 104.821234 101.862085 106.038966 107.008139 106.438164 99.678616 96.471299 94.138031 86.696577 ... 291534.0 297443.0 301971.0 307648.0 313935.0 317287.0 320524.0 324574.0 328433.0 331395.0
10 Greenwich 141.354964 131.343325 133.334866 140.882082 139.752924 137.268366 136.029571 127.479807 129.439618 ... 236535.0 239748.0 243672.0 249171.0 255483.0 260068.0 264008.0 268678.0 274803.0 279766.0

10 rows × 199 columns

In [755]:
# Setting Borough name as index for future manipulations
cr_df.set_index('Borough',inplace = True  )
In [756]:
# Checking data types
cr_df.dtypes
Out[756]:
1999-00      float64
2000-01      float64
2001-02      float64
2002-03      float64
2003-04      float64
2004-05      float64
2005-06      float64
2006-07      float64
2007-08      float64
2008-09      float64
2009-10      float64
2010-11      float64
2011-12      float64
2012-13      float64
2013-14      float64
2014-15      float64
2015-16      float64
2016-17      float64
1999-00.1    float64
2000-01.1    float64
2001-02.1    float64
2002-03.1    float64
2003-04.1    float64
2004-05.1    float64
2005-06.1    float64
2006-07.1    float64
2007-08.1    float64
2008-09.1    float64
2009-10.1    float64
2010-11.1    float64
              ...   
2005-06.9    float64
2006-07.9    float64
2007-08.9    float64
2008-09.9    float64
2009-10.9    float64
2010-11.9    float64
2011-12.9    float64
2012-13.9    float64
2013-14.9    float64
2014-15.9    float64
2015-16.9    float64
2016-17.9    float64
1999         float64
2000         float64
2001         float64
2002         float64
2003         float64
2004         float64
2005         float64
2006         float64
2007         float64
2008         float64
2009         float64
2010         float64
2011         float64
2012         float64
2013         float64
2014         float64
2015         float64
2016         float64
Length: 198, dtype: object
In [757]:
# Removing the last three rows as those areas are not part of our study
cf_df = cr_df.iloc[0:32,:]
In [758]:
cf_df.head()
Out[758]:
1999-00 2000-01 2001-02 2002-03 2003-04 2004-05 2005-06 2006-07 2007-08 2008-09 ... 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Borough
Barking and Dagenham 120.453818 123.574527 123.981310 122.591776 133.229048 129.472858 132.683807 127.927637 115.576433 115.898917 ... 169031.0 172452.0 177580.0 182838.0 187029.0 190560.0 194352.0 198294.0 201979.0 206460.0
Barnet 97.961840 95.676158 101.020092 107.121465 112.764992 117.035243 106.463008 90.447127 79.576033 77.750787 ... 334837.0 339212.0 345829.0 351438.0 357538.0 363956.0 369088.0 374915.0 379691.0 386083.0
Bexley 95.098824 96.343677 96.911185 90.141154 92.466002 85.366903 89.981159 76.124149 74.840289 71.347264 ... 224625.0 226652.0 228146.0 230711.0 232774.0 234271.0 236687.0 239865.0 242142.0 244760.0
Brent 127.740409 122.716035 123.418144 124.270485 133.559181 132.602903 129.697090 110.211787 99.289378 93.547289 ... 283273.0 290901.0 298118.0 304785.0 312245.0 314660.0 317264.0 320762.0 324012.0 328254.0
Bromley 89.772196 88.521826 96.678797 102.053474 104.012407 103.465441 102.841818 94.443503 83.254853 86.402508 ... 302625.0 304968.0 306924.0 308560.0 310554.0 314036.0 317899.0 321278.0 324857.0 326889.0

5 rows × 198 columns

Splitting the crime data frame in to separate data frames according to crime types for the purpose of future analysis

In [759]:
# Violence data frame
violence = cf_df.iloc[:, 18:36]
In [760]:
# Slightly changing the column names for readability 
violence.columns = ['Year 1999', 'Year 2000', 'Year 2001', 'Year 2002', ' Year 2003', 'Year 2004', 'Year 2005', 'Year 2006', 'Year 2007', 'Year 2008', 'Year 2009', 'Year 2010', 'Year 2011', ' Year 2012', 'Year 2013', 'Year 2014', 'Year 2015', 'Year 2016']
In [761]:
# Checking the number of rows and columns
violence.shape
Out[761]:
(32, 18)
In [762]:
# Checking the head of data
violence.head()
Out[762]:
Year 1999 Year 2000 Year 2001 Year 2002 Year 2003 Year 2004 Year 2005 Year 2006 Year 2007 Year 2008 Year 2009 Year 2010 Year 2011 Year 2012 Year 2013 Year 2014 Year 2015 Year 2016
Borough
Barking and Dagenham 19.760656 21.989957 23.790551 26.401053 29.492810 31.543989 31.802736 30.809359 29.302317 30.576624 29.350152 24.480688 22.023323 19.736566 21.610274 26.077441 30.186306 24.668217
Barnet 13.538819 14.484584 15.155831 16.365519 18.076333 22.188723 19.518167 16.662586 15.649406 15.282478 14.041622 12.605353 11.618345 11.201904 12.346649 16.217009 18.870608 16.877200
Bexley 14.490154 14.671928 15.263512 16.223765 18.412297 18.690277 19.663566 16.759226 14.370618 15.256870 16.642852 13.272016 11.689450 12.131250 11.711670 15.050132 17.076757 15.754208
Brent 20.770829 21.796977 21.096358 23.333370 27.470623 34.043267 30.460731 22.480687 20.933869 22.292808 23.534305 23.268862 23.337443 19.681561 20.203994 25.165076 27.091589 23.938779
Bromley 10.932445 12.867529 12.973553 15.937007 16.834145 19.380640 18.365445 18.929237 16.674102 18.280606 17.404960 15.698730 14.728517 13.055828 13.925807 18.569588 20.541346 17.207676
In [763]:
# Checking for missing values 
violence.isnull().sum
Out[763]:
<bound method DataFrame.sum of                         Year 1999  Year 2000  Year 2001  Year 2002  \
Borough                                                              
Barking and Dagenham        False      False      False      False   
Barnet                      False      False      False      False   
Bexley                      False      False      False      False   
Brent                       False      False      False      False   
Bromley                     False      False      False      False   
Camden                      False      False      False      False   
Croydon                     False      False      False      False   
Ealing                      False      False      False      False   
Enfield                     False      False      False      False   
Greenwich                   False      False      False      False   
Hackney                     False      False      False      False   
Hammersmith and Fulham      False      False      False      False   
Haringey                    False      False      False      False   
Harrow                      False      False      False      False   
Havering                    False      False      False      False   
Hillingdon                  False      False      False      False   
Hounslow                    False      False      False      False   
Islington                   False      False      False      False   
Kensington and Chelsea      False      False      False      False   
Kingston upon Thames        False      False      False      False   
Lambeth                     False      False      False      False   
Lewisham                    False      False      False      False   
Merton                      False      False      False      False   
Newham                      False      False      False      False   
Redbridge                   False      False      False      False   
Richmond upon Thames        False      False      False      False   
Southwark                   False      False      False      False   
Sutton                      False      False      False      False   
Tower Hamlets               False      False      False      False   
Waltham Forest              False      False      False      False   
Wandsworth                  False      False      False      False   
Westminster                 False      False      False      False   

                         Year 2003  Year 2004  Year 2005  Year 2006  \
Borough                                                               
Barking and Dagenham         False      False      False      False   
Barnet                       False      False      False      False   
Bexley                       False      False      False      False   
Brent                        False      False      False      False   
Bromley                      False      False      False      False   
Camden                       False      False      False      False   
Croydon                      False      False      False      False   
Ealing                       False      False      False      False   
Enfield                      False      False      False      False   
Greenwich                    False      False      False      False   
Hackney                      False      False      False      False   
Hammersmith and Fulham       False      False      False      False   
Haringey                     False      False      False      False   
Harrow                       False      False      False      False   
Havering                     False      False      False      False   
Hillingdon                   False      False      False      False   
Hounslow                     False      False      False      False   
Islington                    False      False      False      False   
Kensington and Chelsea       False      False      False      False   
Kingston upon Thames         False      False      False      False   
Lambeth                      False      False      False      False   
Lewisham                     False      False      False      False   
Merton                       False      False      False      False   
Newham                       False      False      False      False   
Redbridge                    False      False      False      False   
Richmond upon Thames         False      False      False      False   
Southwark                    False      False      False      False   
Sutton                       False      False      False      False   
Tower Hamlets                False      False      False      False   
Waltham Forest               False      False      False      False   
Wandsworth                   False      False      False      False   
Westminster                  False      False      False      False   

                        Year 2007  Year 2008  Year 2009  Year 2010  Year 2011  \
Borough                                                                         
Barking and Dagenham        False      False      False      False      False   
Barnet                      False      False      False      False      False   
Bexley                      False      False      False      False      False   
Brent                       False      False      False      False      False   
Bromley                     False      False      False      False      False   
Camden                      False      False      False      False      False   
Croydon                     False      False      False      False      False   
Ealing                      False      False      False      False      False   
Enfield                     False      False      False      False      False   
Greenwich                   False      False      False      False      False   
Hackney                     False      False      False      False      False   
Hammersmith and Fulham      False      False      False      False      False   
Haringey                    False      False      False      False      False   
Harrow                      False      False      False      False      False   
Havering                    False      False      False      False      False   
Hillingdon                  False      False      False      False      False   
Hounslow                    False      False      False      False      False   
Islington                   False      False      False      False      False   
Kensington and Chelsea      False      False      False      False      False   
Kingston upon Thames        False      False      False      False      False   
Lambeth                     False      False      False      False      False   
Lewisham                    False      False      False      False      False   
Merton                      False      False      False      False      False   
Newham                      False      False      False      False      False   
Redbridge                   False      False      False      False      False   
Richmond upon Thames        False      False      False      False      False   
Southwark                   False      False      False      False      False   
Sutton                      False      False      False      False      False   
Tower Hamlets               False      False      False      False      False   
Waltham Forest              False      False      False      False      False   
Wandsworth                  False      False      False      False      False   
Westminster                 False      False      False      False      False   

                         Year 2012  Year 2013  Year 2014  Year 2015  Year 2016  
Borough                                                                         
Barking and Dagenham         False      False      False      False      False  
Barnet                       False      False      False      False      False  
Bexley                       False      False      False      False      False  
Brent                        False      False      False      False      False  
Bromley                      False      False      False      False      False  
Camden                       False      False      False      False      False  
Croydon                      False      False      False      False      False  
Ealing                       False      False      False      False      False  
Enfield                      False      False      False      False      False  
Greenwich                    False      False      False      False      False  
Hackney                      False      False      False      False      False  
Hammersmith and Fulham       False      False      False      False      False  
Haringey                     False      False      False      False      False  
Harrow                       False      False      False      False      False  
Havering                     False      False      False      False      False  
Hillingdon                   False      False      False      False      False  
Hounslow                     False      False      False      False      False  
Islington                    False      False      False      False      False  
Kensington and Chelsea       False      False      False      False      False  
Kingston upon Thames         False      False      False      False      False  
Lambeth                      False      False      False      False      False  
Lewisham                     False      False      False      False      False  
Merton                       False      False      False      False      False  
Newham                       False      False      False      False      False  
Redbridge                    False      False      False      False      False  
Richmond upon Thames         False      False      False      False      False  
Southwark                    False      False      False      False      False  
Sutton                       False      False      False      False      False  
Tower Hamlets                False      False      False      False      False  
Waltham Forest               False      False      False      False      False  
Wandsworth                   False      False      False      False      False  
Westminster                  False      False      False      False      False  >
In [764]:
# Sex offences data frame
sex_offences= cf_df.iloc[:, 36:54]
In [765]:
# Slightly changing the column names for readability 
sex_offences.columns = ['Year 1999', 'Year 2000', 'Year 2001', 'Year 2002', ' Year 2003', 'Year 2004', 'Year 2005', 'Year 2006', 'Year 2007', 'Year 2008', 'Year 2009', 'Year 2010', 'Year 2011', ' Year 2012', 'Year 2013', 'Year 2014', 'Year 2015', 'Year 2016']
In [766]:
sex_offences.head()
Out[766]:
Year 1999 Year 2000 Year 2001 Year 2002 Year 2003 Year 2004 Year 2005 Year 2006 Year 2007 Year 2008 Year 2009 Year 2010 Year 2011 Year 2012 Year 2013 Year 2014 Year 2015 Year 2016
Borough
Barking and Dagenham 1.145010 1.281324 1.273739 1.508803 1.413874 1.473341 1.551646 1.573371 1.703829 1.188737 1.503548 1.432963 1.470360 1.353904 1.543591 2.032336 2.044767 1.918047
Barnet 0.778386 0.696679 0.766869 0.910929 0.826595 1.170754 0.943393 0.909913 0.803376 0.728158 0.919530 0.862172 0.830681 0.824275 0.964540 1.144259 1.269453 1.201814
Bexley 0.712781 0.864130 1.014825 0.930984 0.722675 1.154280 0.894820 0.810641 0.681135 0.604451 0.775819 0.758525 0.726026 0.759804 1.043572 1.158985 1.230683 1.299232
Brent 1.294575 1.279511 1.238781 1.063471 1.393842 1.110552 1.177387 1.215172 1.210846 0.986590 1.049920 1.118821 1.092091 1.204475 1.153613 1.524495 1.626483 1.523211
Bromley 0.562899 0.629832 0.783207 0.808985 0.783844 1.035966 0.978555 0.867216 0.806278 0.737782 0.912278 0.781047 0.824333 0.761059 0.975152 1.226352 1.271329 1.165533
In [767]:
# Robbery data frame
robbery = cf_df.iloc[:, 54:72]
In [768]:
# Slightly changing the column names for readability 
robbery.columns = ['Year 1999', 'Year 2000', 'Year 2001', 'Year 2002', ' Year 2003', 'Year 2004', 'Year 2005', 'Year 2006', 'Year 2007', 'Year 2008', 'Year 2009', 'Year 2010', 'Year 2011', ' Year 2012', 'Year 2013', 'Year 2014', 'Year 2015', 'Year 2016']
robbery.head()
Out[768]:
Year 1999 Year 2000 Year 2001 Year 2002 Year 2003 Year 2004 Year 2005 Year 2006 Year 2007 Year 2008 Year 2009 Year 2010 Year 2011 Year 2012 Year 2013 Year 2014 Year 2015 Year 2016
Borough
Barking and Dagenham 3.164167 4.460227 5.197580 4.123662 4.825221 4.462291 4.660953 4.815832 3.662050 4.383829 4.696475 5.124755 5.507167 3.500210 2.706430 2.450906 2.911194 2.077884
Barnet 2.794535 3.296557 4.178652 3.303676 3.362316 3.740853 3.743043 3.213412 2.858107 2.629624 2.807746 2.743016 3.356287 2.948159 1.923660 1.437659 1.638174 0.960933
Bexley 1.328992 1.906573 2.290212 1.547076 1.763508 1.819688 2.126885 2.006449 1.745131 1.658931 1.257966 1.378348 1.361836 1.032992 1.085822 0.846309 0.702067 0.764014
Brent 7.736721 7.326049 8.749351 6.651326 6.734421 7.766411 8.691993 7.753957 7.713407 5.771723 6.245849 6.630904 7.068168 4.620861 3.542791 2.659293 2.987544 2.242166
Bromley 1.756516 2.319541 3.095693 2.366281 2.903251 2.452011 3.009141 3.704774 2.564230 2.590436 1.941849 2.113041 2.260476 1.878765 1.695507 1.204564 1.114336 0.985044
In [769]:
# Burglary data frame
burglary = cf_df.iloc[:, 72:90]
In [770]:
# Slightly changing the column names for readability 
burglary.columns = ['Year 1999', 'Year 2000', 'Year 2001', 'Year 2002', ' Year 2003', 'Year 2004', 'Year 2005', 'Year 2006', 'Year 2007', 'Year 2008', 'Year 2009', 'Year 2010', 'Year 2011', ' Year 2012', 'Year 2013', 'Year 2014', 'Year 2015', 'Year 2016']
burglary.head()
Out[770]:
Year 1999 Year 2000 Year 2001 Year 2002 Year 2003 Year 2004 Year 2005 Year 2006 Year 2007 Year 2008 Year 2009 Year 2010 Year 2011 Year 2012 Year 2013 Year 2014 Year 2015 Year 2016
Borough
Barking and Dagenham 15.777745 12.020037 13.401427 12.094472 13.476927 13.133265 11.553150 12.664740 9.199496 11.951152 13.683973 12.103611 13.024718 13.329135 10.321479 9.450614 7.594849 5.066357
Barnet 14.514992 13.005725 12.376323 12.600140 12.361639 12.495251 13.320470 11.801657 12.241777 11.078618 12.271961 13.131762 13.743994 13.204893 10.756242 9.788885 10.213568 6.915611
Bexley 13.570437 11.809782 10.943650 10.683497 10.117446 8.989802 10.638920 9.297743 9.705064 9.918289 8.810148 8.296093 6.499867 7.183988 7.533156 5.857461 4.526270 3.644386
Brent 17.517104 16.761969 15.202878 14.299425 15.138471 14.858293 14.409147 12.043225 12.073159 11.945645 12.511824 11.893630 12.730388 11.777792 8.995663 8.077640 8.385492 6.763055
Bromley 12.383775 11.049144 12.085694 12.121294 11.680286 12.700676 13.111973 11.921731 11.016935 11.899609 11.719514 9.719341 11.025458 10.486059 9.644573 7.843674 7.868077 5.411623
In [771]:
# Theft data frame
theft = cf_df.iloc[:, 90:108]
In [772]:
# Slightly changing the column names for readability 
theft.columns = ['Year 1999', 'Year 2000', 'Year 2001', 'Year 2002', ' Year 2003', 'Year 2004', 'Year 2005', 'Year 2006', 'Year 2007', 'Year 2008', 'Year 2009', 'Year 2010', 'Year 2011', ' Year 2012', 'Year 2013', 'Year 2014', 'Year 2015', 'Year 2016']
theft.head()
Out[772]:
Year 1999 Year 2000 Year 2001 Year 2002 Year 2003 Year 2004 Year 2005 Year 2006 Year 2007 Year 2008 Year 2009 Year 2010 Year 2011 Year 2012 Year 2013 Year 2014 Year 2015 Year 2016
Borough
Barking and Dagenham 40.777129 41.880983 42.359376 40.064440 42.891523 40.402150 43.265674 39.124895 36.525844 32.727947 31.906746 34.538772 33.385197 32.756087 30.249238 26.153086 26.482951 23.423423
Barnet 35.087999 35.866288 37.792545 42.579675 45.189278 48.056517 46.406404 35.725406 30.352082 30.063795 28.282764 27.600886 28.338247 28.734243 26.199714 24.186816 24.351380 22.088515
Bexley 36.080530 35.808831 31.834410 30.772671 30.134172 25.303621 27.591045 22.021677 24.921536 23.189736 21.381045 19.123492 17.265674 18.807279 17.947754 17.705793 17.502127 15.141363
Brent 39.905193 39.143973 40.171352 41.816275 45.612937 41.586077 41.968856 37.359315 30.984245 26.792620 28.428340 27.376675 28.711429 26.037628 24.515861 24.008455 26.718146 23.713344
Bromley 36.846139 34.962430 39.221114 40.085213 39.464698 34.947041 36.864482 32.176713 28.837670 30.639280 24.768347 24.721286 24.781519 25.532105 24.133451 23.459434 23.311796 19.502033
In [773]:
# Fraud and Forgery data frame
fraud_forgery = cf_df.iloc[:, 108:126]
In [774]:
# Slightly changing the column names for readability 
fraud_forgery.columns = ['Year 1999', 'Year 2000', 'Year 2001', 'Year 2002', ' Year 2003', 'Year 2004', 'Year 2005', 'Year 2006', 'Year 2007', 'Year 2008', 'Year 2009', 'Year 2010', 'Year 2011', ' Year 2012', 'Year 2013', 'Year 2014', 'Year 2015', 'Year 2016']
fraud_forgery.head()
Out[774]:
Year 1999 Year 2000 Year 2001 Year 2002 Year 2003 Year 2004 Year 2005 Year 2006 Year 2007 Year 2008 Year 2009 Year 2010 Year 2011 Year 2012 Year 2013 Year 2014 Year 2015 Year 2016
Borough
Barking and Dagenham 13.500037 14.564380 12.266531 10.615724 10.456651 11.062134 10.825440 9.482104 7.075625 6.912068 7.354432 6.612411 6.731576 3.647145 0.154359 0.126075 0.118824 0.077497
Barnet 12.498844 9.816837 12.301201 12.824752 13.458586 11.482039 6.118318 6.275676 2.908878 4.339469 4.392344 4.131596 5.288948 3.967513 0.121922 0.090687 0.142221 0.069933
Bexley 6.157511 5.792874 6.532362 6.051396 6.231365 4.970192 4.838325 3.730742 3.178631 2.739883 2.936716 2.583319 2.822480 1.844018 0.046475 0.025014 0.041298 0.044942
Brent 19.022960 15.531525 15.199169 16.229977 15.462707 12.089366 7.060630 5.377861 3.978494 4.902011 5.199284 4.373575 4.752678 3.731011 0.107166 0.062352 0.117280 0.088346
Bromley 6.870079 4.737282 5.782903 7.894347 7.700511 7.456938 6.545967 4.405857 3.733994 4.056163 4.708006 4.968240 4.495192 1.795972 0.040893 0.052914 0.098505 0.085656
In [775]:
# Criminal damage data frame
criminal_damage = cf_df.iloc[:, 126:144]
In [776]:
# Slightly changing the column names for readability 
criminal_damage.columns = ['Year 1999', 'Year 2000', 'Year 2001', 'Year 2002', ' Year 2003', 'Year 2004', 'Year 2005', 'Year 2006', 'Year 2007', 'Year 2008', 'Year 2009', 'Year 2010', 'Year 2011', ' Year 2012', 'Year 2013', 'Year 2014', 'Year 2015', 'Year 2016']
criminal_damage.head()
Out[776]:
Year 1999 Year 2000 Year 2001 Year 2002 Year 2003 Year 2004 Year 2005 Year 2006 Year 2007 Year 2008 Year 2009 Year 2010 Year 2011 Year 2012 Year 2013 Year 2014 Year 2015 Year 2016
Borough
Barking and Dagenham 22.567777 23.850927 21.774301 21.009035 24.463029 23.392307 23.773869 22.110950 18.890026 17.633892 14.883433 13.224822 11.174738 8.779387 8.474315 8.436967 9.664371 7.609222
Barnet 16.301452 16.036278 15.938350 15.803988 16.385852 14.487695 13.463963 11.937691 10.019801 8.740846 7.847809 7.651421 6.606291 5.423733 5.323934 5.579931 5.941674 4.781355
Bexley 20.187806 22.577120 26.175162 20.823921 22.012035 21.365490 21.053010 18.053565 15.323317 12.843478 11.374295 9.245333 7.187229 6.594926 5.906535 6.361912 6.648991 5.862886
Brent 16.410761 16.414728 16.764335 15.755676 16.670207 14.612332 13.324032 11.934728 10.276306 9.147442 9.221181 8.419049 7.567775 6.607132 6.452040 6.718377 6.613953 5.867408
Bromley 17.914426 19.609437 20.235097 20.133617 21.900536 22.290091 20.843561 18.361061 15.372160 13.476824 10.803978 9.100337 8.346375 6.881377 6.439152 7.171359 6.953829 5.616585
In [777]:
# Drugs data frame
drugs = cf_df.iloc[:, 144:162]
In [778]:
# Slightly changing the column names for readability 
drugs.columns = ['Year 1999', 'Year 2000', 'Year 2001', 'Year 2002', ' Year 2003', 'Year 2004', 'Year 2005', 'Year 2006', 'Year 2007', 'Year 2008', 'Year 2009', 'Year 2010', 'Year 2011', ' Year 2012', 'Year 2013', 'Year 2014', 'Year 2015', 'Year 2016']
drugs.head()
Out[778]:
Year 1999 Year 2000 Year 2001 Year 2002 Year 2003 Year 2004 Year 2005 Year 2006 Year 2007 Year 2008 Year 2009 Year 2010 Year 2011 Year 2012 Year 2013 Year 2014 Year 2015 Year 2016
Borough
Barking and Dagenham 2.733250 2.477226 2.819129 5.217694 4.704891 2.524002 3.891144 5.892664 7.779638 8.819846 6.380223 5.534954 5.496474 5.982368 6.241253 5.491845 5.792681 4.523879
Barnet 1.575913 1.532693 1.709022 1.940403 1.967048 2.495961 2.289790 3.068310 3.858594 3.897268 4.516683 3.616570 3.003877 2.577235 3.058891 2.413880 2.228128 1.895965
Bexley 1.724471 2.016304 1.997650 2.336587 2.186205 2.263293 2.302251 2.490147 4.171397 4.385578 3.773899 4.217398 4.025364 3.419117 3.743340 3.110083 2.903255 2.010132
Brent 3.722385 3.208213 3.786811 3.987090 3.820023 4.837237 11.419545 10.665307 10.968218 10.670297 9.985979 12.251259 14.456597 11.609356 10.732387 8.492278 5.746701 5.212427
Bromley 1.780252 1.506855 1.708201 1.840441 1.920923 2.435193 2.544912 3.309377 3.529120 3.925002 4.170414 3.503370 3.406815 3.273510 3.453927 2.947603 2.644240 1.713120
In [779]:
# Other offences data frame
other_offences = cf_df.iloc[:, 162:180]
In [780]:
# Slightly changing the column names for readability 
other_offences.columns = ['Year 1999', 'Year 2000', 'Year 2001', 'Year 2002', ' Year 2003', 'Year 2004', 'Year 2005', 'Year 2006', 'Year 2007', 'Year 2008', 'Year 2009', 'Year 2010', 'Year 2011', ' Year 2012', 'Year 2013', 'Year 2014', 'Year 2015', 'Year 2016']
other_offences.head()
Out[780]:
Year 1999 Year 2000 Year 2001 Year 2002 Year 2003 Year 2004 Year 2005 Year 2006 Year 2007 Year 2008 Year 2009 Year 2010 Year 2011 Year 2012 Year 2013 Year 2014 Year 2015 Year 2016
Borough
Barking and Dagenham 1.028047 1.049465 1.098676 1.556893 1.504121 1.479379 1.359194 1.453723 1.437606 1.704822 1.982205 1.832223 1.839287 1.385390 1.559027 1.654110 1.732853 1.535406
Barnet 0.870900 0.940516 0.801300 0.792383 1.137345 0.917451 0.659459 0.852476 0.884012 0.990531 0.890614 0.788190 0.718805 0.807790 0.872421 1.000227 1.129866 1.023096
Bexley 0.846140 0.896135 0.859401 0.771256 0.886299 0.810259 0.872337 0.953959 0.743461 0.750049 0.771436 0.676171 0.695954 0.879324 0.992872 1.079774 1.585846 1.086779
Brent 1.359880 1.253090 1.209109 1.133875 1.255949 1.699368 1.184769 1.381535 1.150833 1.038154 1.254537 1.430517 1.460392 1.363376 1.443593 1.546318 1.351802 1.440957
Bromley 0.725665 0.839776 0.793335 0.866288 0.824214 0.766884 0.577782 0.767536 0.720363 0.796805 1.052378 0.794011 0.660111 0.812009 0.858763 1.126750 1.262094 1.061522
In [781]:
# All offences together (overall crime rate) data frame
all_offences = cf_df.iloc[:, 0:18]
In [782]:
all_offences.head()
Out[782]:
1999-00 2000-01 2001-02 2002-03 2003-04 2004-05 2005-06 2006-07 2007-08 2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15 2015-16 2016-17
Borough
Barking and Dagenham 120.453818 123.574527 123.981310 122.591776 133.229048 129.472858 132.683807 127.927637 115.576433 115.898917 111.741187 104.885199 100.652840 90.470193 82.859965 81.873380 86.528798 70.899932
Barnet 97.961840 95.676158 101.020092 107.121465 112.764992 117.035243 106.463008 90.447127 79.576033 77.750787 75.971072 73.130965 73.505474 69.689743 61.567973 61.859355 65.785073 55.814423
Bexley 95.098824 96.343677 96.911185 90.141154 92.466002 85.366903 89.981159 76.124149 74.840289 71.347264 67.724177 59.550693 52.273879 52.652697 50.011196 51.195464 52.217294 45.607942
Brent 127.740409 122.716035 123.418144 124.270485 133.559181 132.602903 129.697090 110.211787 99.289378 93.547289 97.431219 96.763292 101.176960 86.633191 77.147108 78.254282 80.638989 70.789693
Bromley 89.772196 88.521826 96.678797 102.053474 104.012407 103.465441 102.841818 94.443503 83.254853 86.402508 77.481722 71.399404 70.528797 64.476684 61.167226 63.602239 65.065552 52.748792
In [783]:
# Slightly changing the column names for readability 
all_offences.columns = ['Year 1999', 'Year 2000', 'Year 2001', 'Year 2002', ' Year 2003', 'Year 2004', 'Year 2005', 'Year 2006', 'Year 2007', 'Year 2008', 'Year 2009', 'Year 2010', 'Year 2011', ' Year 2012', 'Year 2013', 'Year 2014', 'Year 2015', 'Year 2016']
In [784]:
all_offences.head()
Out[784]:
Year 1999 Year 2000 Year 2001 Year 2002 Year 2003 Year 2004 Year 2005 Year 2006 Year 2007 Year 2008 Year 2009 Year 2010 Year 2011 Year 2012 Year 2013 Year 2014 Year 2015 Year 2016
Borough
Barking and Dagenham 120.453818 123.574527 123.981310 122.591776 133.229048 129.472858 132.683807 127.927637 115.576433 115.898917 111.741187 104.885199 100.652840 90.470193 82.859965 81.873380 86.528798 70.899932
Barnet 97.961840 95.676158 101.020092 107.121465 112.764992 117.035243 106.463008 90.447127 79.576033 77.750787 75.971072 73.130965 73.505474 69.689743 61.567973 61.859355 65.785073 55.814423
Bexley 95.098824 96.343677 96.911185 90.141154 92.466002 85.366903 89.981159 76.124149 74.840289 71.347264 67.724177 59.550693 52.273879 52.652697 50.011196 51.195464 52.217294 45.607942
Brent 127.740409 122.716035 123.418144 124.270485 133.559181 132.602903 129.697090 110.211787 99.289378 93.547289 97.431219 96.763292 101.176960 86.633191 77.147108 78.254282 80.638989 70.789693
Bromley 89.772196 88.521826 96.678797 102.053474 104.012407 103.465441 102.841818 94.443503 83.254853 86.402508 77.481722 71.399404 70.528797 64.476684 61.167226 63.602239 65.065552 52.748792

Investigating the change in overall crime rate (all offences) in London (by borough)

In [785]:
#  Setting year as the index for the ‘all offenses’ data frame in order to convert it to a time series data frame. 
# This will help with analysis of the trends in crime rate. 
all_offences_time_series = all_offences.T
all_offences_time_series.index.name = 'Year'
all_offences_time_series.head()
Out[785]:
Borough Barking and Dagenham Barnet Bexley Brent Bromley Camden Croydon Ealing Enfield Greenwich ... Merton Newham Redbridge Richmond upon Thames Southwark Sutton Tower Hamlets Waltham Forest Wandsworth Westminster
Year
Year 1999 120.453818 97.961840 95.098824 127.740409 89.772196 241.048826 113.661140 128.161269 104.821234 141.354964 ... 90.845230 159.385823 103.209064 89.731569 176.451364 80.797213 176.071150 122.298774 134.459920 463.000639
Year 2000 123.574527 95.676158 96.343677 122.716035 88.521826 238.451579 105.376061 126.727338 101.862085 131.343325 ... 88.041191 157.970855 105.166175 84.912098 160.042892 77.225254 177.900199 116.454884 129.834326 424.368123
Year 2001 123.981310 101.020092 96.911185 123.418144 96.678797 262.150301 111.753682 127.338289 106.038966 133.334866 ... 92.864693 162.847669 115.757794 92.116963 178.047773 84.640777 185.354816 136.297097 131.437172 424.287731
Year 2002 122.591776 107.121465 90.141154 124.270485 102.053474 264.202930 115.134982 129.823892 107.008139 140.882082 ... 93.828223 161.221708 117.642965 95.838487 179.342800 82.680054 198.636919 144.179894 131.690656 413.988467
Year 2003 133.229048 112.764992 92.466002 133.559181 104.012407 249.767446 110.273608 123.096447 106.438164 139.752924 ... 98.986783 158.513644 114.941878 87.175898 181.302450 89.117159 187.816018 137.929188 121.255798 375.794397

5 rows × 32 columns

In [786]:
# Lets see how all offences rate changed from Year 1999 to Year 2016.  
x = all_offences["Year 1999"]
y = all_offences["Year 2016"]
print(np.corrcoef(x, y))
plt.scatter(x, y)
plt.xlabel('Year 1999')
plt.ylabel('Year 2016')
plt.title('All offences in Year 1999 and Year 2016')
[[1.         0.97290128]
 [0.97290128 1.        ]]
Out[786]:
Text(0.5, 1.0, 'All offences in Year 1999 and Year 2016')

Observations: We can see a strong positive correlation (0.97) between boroughs in trend to decrease in all offences rate. Also we can observe one obvious outlier.

In [787]:
# Converting to a datetime index. Even though the date might be not exactly the same as in original dataset, it covers exactly the same periods 
rng = pd.date_range(start='01.01.1999', periods = 18, freq = 'Y' )
rng
Out[787]:
DatetimeIndex(['1999-12-31', '2000-12-31', '2001-12-31', '2002-12-31',
               '2003-12-31', '2004-12-31', '2005-12-31', '2006-12-31',
               '2007-12-31', '2008-12-31', '2009-12-31', '2010-12-31',
               '2011-12-31', '2012-12-31', '2013-12-31', '2014-12-31',
               '2015-12-31', '2016-12-31'],
              dtype='datetime64[ns]', freq='A-DEC')
In [788]:
all_offences_time_series.set_index(rng,inplace = True  )
In [789]:
all_offences_time_series.head()
Out[789]:
Borough Barking and Dagenham Barnet Bexley Brent Bromley Camden Croydon Ealing Enfield Greenwich ... Merton Newham Redbridge Richmond upon Thames Southwark Sutton Tower Hamlets Waltham Forest Wandsworth Westminster
1999-12-31 120.453818 97.961840 95.098824 127.740409 89.772196 241.048826 113.661140 128.161269 104.821234 141.354964 ... 90.845230 159.385823 103.209064 89.731569 176.451364 80.797213 176.071150 122.298774 134.459920 463.000639
2000-12-31 123.574527 95.676158 96.343677 122.716035 88.521826 238.451579 105.376061 126.727338 101.862085 131.343325 ... 88.041191 157.970855 105.166175 84.912098 160.042892 77.225254 177.900199 116.454884 129.834326 424.368123
2001-12-31 123.981310 101.020092 96.911185 123.418144 96.678797 262.150301 111.753682 127.338289 106.038966 133.334866 ... 92.864693 162.847669 115.757794 92.116963 178.047773 84.640777 185.354816 136.297097 131.437172 424.287731
2002-12-31 122.591776 107.121465 90.141154 124.270485 102.053474 264.202930 115.134982 129.823892 107.008139 140.882082 ... 93.828223 161.221708 117.642965 95.838487 179.342800 82.680054 198.636919 144.179894 131.690656 413.988467
2003-12-31 133.229048 112.764992 92.466002 133.559181 104.012407 249.767446 110.273608 123.096447 106.438164 139.752924 ... 98.986783 158.513644 114.941878 87.175898 181.302450 89.117159 187.816018 137.929188 121.255798 375.794397

5 rows × 32 columns

In [790]:
# Displaying the change in total crime rates by borough between years 1999 and 2016
ux = all_offences_time_series.plot(colormap='tab10', fontsize=10, linewidth=4, figsize=(20,12) )

# Set labels and legend
ux.set_xlabel('Year', fontsize=15)
ux.set_ylabel('Crime Rate', fontsize=15)
ux.set_title('Crime rate in London by borough (Years 1999-2016 )', fontsize=20)
ux.legend(loc='center left', bbox_to_anchor=(1.0, 0.5), fontsize=15)
Out[790]:
<matplotlib.legend.Legend at 0x20e771534a8>

Observations: As we can see on the above graph there is a trend to decrease in overall crime rate (all offences) for all boroughs. Visually it is seen that Westminster shows the highest crime rate during the whole period between 1999 and 2016. We can confidently say that Westminster in the outlier that we could see on the scatter plot above. The majority of boroughs do not significantly differ from each other.

In [791]:
# Finding what was the percentage change in all offences rate between years 1999 and 2016
change_1999_2016 = all_offences_time_series.pct_change(freq = '17Y' )
ch = change_1999_2016.iloc[17, :]
ch.head(32)
Out[791]:
Borough
Barking and Dagenham     -0.411393
Barnet                   -0.430243
Bexley                   -0.520415
Brent                    -0.445832
Bromley                  -0.412415
Camden                   -0.572139
Croydon                  -0.406786
Ealing                   -0.461537
Enfield                  -0.441493
Greenwich                -0.493682
Hackney                  -0.534761
Hammersmith and Fulham   -0.448860
Haringey                 -0.449607
Harrow                   -0.436600
Havering                 -0.302289
Hillingdon               -0.408271
Hounslow                 -0.507450
Islington                -0.543363
Kensington and Chelsea   -0.473255
Kingston upon Thames     -0.505417
Lambeth                  -0.535789
Lewisham                 -0.421951
Merton                   -0.410874
Newham                   -0.515963
Redbridge                -0.450773
Richmond upon Thames     -0.443482
Southwark                -0.501905
Sutton                   -0.442922
Tower Hamlets            -0.515519
Waltham Forest           -0.469428
Wandsworth               -0.510402
Westminster              -0.630042
Name: 2016-12-31 00:00:00, dtype: float64
In [792]:
# Displaying the percentage change in crime rate between years 1999 and 2016
ch.plot(kind = 'bar', figsize=(10,10), title = ' Percentage change in crime rate between years 1999 and 2016')
plt.style.use('fivethirtyeight')

Observations: we can observe that Westminster and Camden had not only the highest crime rate in year 1999 but also had the biggest drop in crime by year 2016 ( 63% for Westminster and 57% for Camden). Havering demonstrated the lowest percentage drop in crime (30%). It seems that the higher was the initial crime rate the more significant was the crime rate drop between 1999 and 2016

Investigating the variation in types of crime in years 1999 and 2016 with the focus on violence.

In [793]:
# All offences in year 2009   
all_offences1999 = all_offences.iloc[:, 0]
all_offences1999.head()
Out[793]:
Borough
Barking and Dagenham    120.453818
Barnet                   97.961840
Bexley                   95.098824
Brent                   127.740409
Bromley                  89.772196
Name: Year 1999, dtype: float64
In [794]:
# All offences in year 2016 
all_offences2016 = all_offences.iloc[:, 17]
all_offences2016.head()
Out[794]:
Borough
Barking and Dagenham    70.899932
Barnet                  55.814423
Bexley                  45.607942
Brent                   70.789693
Bromley                 52.748792
Name: Year 2016, dtype: float64
In [795]:
# Violence Against the Person in year 1999
violence1999 = violence.iloc[:, 0]
violence1999.head()
Out[795]:
Borough
Barking and Dagenham    19.760656
Barnet                  13.538819
Bexley                  14.490154
Brent                   20.770829
Bromley                 10.932445
Name: Year 1999, dtype: float64
In [796]:
# Violence Against the Person in year 2016
violence2016 = violence.iloc[:, 17]
violence2016.head()
Out[796]:
Borough
Barking and Dagenham    24.668217
Barnet                  16.877200
Bexley                  15.754208
Brent                   23.938779
Bromley                 17.207676
Name: Year 2016, dtype: float64
In [797]:
# Sexual Offences in year 1999
sex_offences1999 = sex_offences.iloc[:, 0]
sex_offences1999.head()
Out[797]:
Borough
Barking and Dagenham    1.145010
Barnet                  0.778386
Bexley                  0.712781
Brent                   1.294575
Bromley                 0.562899
Name: Year 1999, dtype: float64
In [798]:
# Sexual Offences in year 2016
sex_offences2016 = sex_offences.iloc[:, 17]
sex_offences2016.head()
Out[798]:
Borough
Barking and Dagenham    1.918047
Barnet                  1.201814
Bexley                  1.299232
Brent                   1.523211
Bromley                 1.165533
Name: Year 2016, dtype: float64
In [799]:
# Robbery in year 1999
robbery1999 = robbery.iloc[:, 0]
robbery1999.head()
Out[799]:
Borough
Barking and Dagenham    3.164167
Barnet                  2.794535
Bexley                  1.328992
Brent                   7.736721
Bromley                 1.756516
Name: Year 1999, dtype: float64
In [800]:
# Robbery in year 2016
robbery2016 = robbery.iloc[:, 17]
robbery2016.head()
Out[800]:
Borough
Barking and Dagenham    2.077884
Barnet                  0.960933
Bexley                  0.764014
Brent                   2.242166
Bromley                 0.985044
Name: Year 2016, dtype: float64
In [801]:
# Burglary in year 1999
burglary1999 = burglary.iloc[:, 0]
burglary1999.head()
Out[801]:
Borough
Barking and Dagenham    15.777745
Barnet                  14.514992
Bexley                  13.570437
Brent                   17.517104
Bromley                 12.383775
Name: Year 1999, dtype: float64
In [802]:
# Burglary in year 2016
burglary2016 = burglary.iloc[:, 17]
burglary2016.head()
Out[802]:
Borough
Barking and Dagenham    5.066357
Barnet                  6.915611
Bexley                  3.644386
Brent                   6.763055
Bromley                 5.411623
Name: Year 2016, dtype: float64
In [803]:
# Theft and Handling in year 1999
theft1999 = theft.iloc[:, 0]
theft1999.head()
Out[803]:
Borough
Barking and Dagenham    40.777129
Barnet                  35.087999
Bexley                  36.080530
Brent                   39.905193
Bromley                 36.846139
Name: Year 1999, dtype: float64
In [804]:
# Theft and Handling in year 2016
theft2016 = theft.iloc[:, 17]
theft2016.head()
Out[804]:
Borough
Barking and Dagenham    23.423423
Barnet                  22.088515
Bexley                  15.141363
Brent                   23.713344
Bromley                 19.502033
Name: Year 2016, dtype: float64
In [805]:
# Fraud and Forgery in year 1999
fraud_forgery1999 = fraud_forgery.iloc[:, 0]
fraud_forgery1999.head()
Out[805]:
Borough
Barking and Dagenham    13.500037
Barnet                  12.498844
Bexley                   6.157511
Brent                   19.022960
Bromley                  6.870079
Name: Year 1999, dtype: float64
In [806]:
# Fraud and Forgery in year 2016
fraud_forgery2016 = fraud_forgery.iloc[:, 17]
fraud_forgery2016.head()
Out[806]:
Borough
Barking and Dagenham    0.077497
Barnet                  0.069933
Bexley                  0.044942
Brent                   0.088346
Bromley                 0.085656
Name: Year 2016, dtype: float64
In [807]:
# Criminal Damage in year 1999
criminal_damage1999 = criminal_damage.iloc[:, 0]
criminal_damage1999.head()
Out[807]:
Borough
Barking and Dagenham    22.567777
Barnet                  16.301452
Bexley                  20.187806
Brent                   16.410761
Bromley                 17.914426
Name: Year 1999, dtype: float64
In [808]:
# Criminal Damage in year 2016
criminal_damage2016 = criminal_damage.iloc[:, 17]
criminal_damage2016.head()
Out[808]:
Borough
Barking and Dagenham    7.609222
Barnet                  4.781355
Bexley                  5.862886
Brent                   5.867408
Bromley                 5.616585
Name: Year 2016, dtype: float64
In [809]:
# Drugs in year 1999
drugs1999 = drugs.iloc[:, 0]
drugs1999.head()
Out[809]:
Borough
Barking and Dagenham    2.733250
Barnet                  1.575913
Bexley                  1.724471
Brent                   3.722385
Bromley                 1.780252
Name: Year 1999, dtype: float64
In [810]:
# Drugs in year 2016
drugs2016 = drugs.iloc[:, 17]
drugs2016.head()
Out[810]:
Borough
Barking and Dagenham    4.523879
Barnet                  1.895965
Bexley                  2.010132
Brent                   5.212427
Bromley                 1.713120
Name: Year 2016, dtype: float64
In [811]:
# Other offences in year 1999
other_offences1999 = other_offences.iloc[:, 0]
other_offences1999.head()
Out[811]:
Borough
Barking and Dagenham    1.028047
Barnet                  0.870900
Bexley                  0.846140
Brent                   1.359880
Bromley                 0.725665
Name: Year 1999, dtype: float64
In [812]:
# Other offences in year 2016
other_offences2016 = other_offences.iloc[:, 17]
other_offences2016.head()
Out[812]:
Borough
Barking and Dagenham    1.535406
Barnet                  1.023096
Bexley                  1.086779
Brent                   1.440957
Bromley                 1.061522
Name: Year 2016, dtype: float64
In [813]:
# Combining all crime types in year 1999
crime_by_type_1999 = pd.concat([violence1999, sex_offences1999, robbery1999, burglary1999, theft1999, fraud_forgery1999, criminal_damage1999, drugs1999, other_offences1999], keys = ['Violence Against the Person', 'Sexual Offences','Robbery','Burglary','Theft and Handling','Fraud or Forgery','Criminal Damage', 'Drugs','Other Notifiable Offences'], axis = 1)
In [814]:
crime_by_type_1999.head()
Out[814]:
Violence Against the Person Sexual Offences Robbery Burglary Theft and Handling Fraud or Forgery Criminal Damage Drugs Other Notifiable Offences
Borough
Barking and Dagenham 19.760656 1.145010 3.164167 15.777745 40.777129 13.500037 22.567777 2.733250 1.028047
Barnet 13.538819 0.778386 2.794535 14.514992 35.087999 12.498844 16.301452 1.575913 0.870900
Bexley 14.490154 0.712781 1.328992 13.570437 36.080530 6.157511 20.187806 1.724471 0.846140
Brent 20.770829 1.294575 7.736721 17.517104 39.905193 19.022960 16.410761 3.722385 1.359880
Bromley 10.932445 0.562899 1.756516 12.383775 36.846139 6.870079 17.914426 1.780252 0.725665
In [815]:
crime_by_type_1999.shape
Out[815]:
(32, 9)
In [816]:
# Combining all crime types in year 2016
crime_by_type_2016 = pd.concat([violence2016, sex_offences2016, robbery2016, burglary2016, theft2016, fraud_forgery2016, criminal_damage2016, drugs2016, other_offences2016], keys = ['Violence Against the Person', 'Sexual Offences','Robbery','Burglary','Theft and Handling','Fraud or Forgery','Criminal Damage', 'Drugs','Other Notifiable Offences'], axis = 1)
In [817]:
crime_by_type_2016.head()
Out[817]:
Violence Against the Person Sexual Offences Robbery Burglary Theft and Handling Fraud or Forgery Criminal Damage Drugs Other Notifiable Offences
Borough
Barking and Dagenham 24.668217 1.918047 2.077884 5.066357 23.423423 0.077497 7.609222 4.523879 1.535406
Barnet 16.877200 1.201814 0.960933 6.915611 22.088515 0.069933 4.781355 1.895965 1.023096
Bexley 15.754208 1.299232 0.764014 3.644386 15.141363 0.044942 5.862886 2.010132 1.086779
Brent 23.938779 1.523211 2.242166 6.763055 23.713344 0.088346 5.867408 5.212427 1.440957
Bromley 17.207676 1.165533 0.985044 5.411623 19.502033 0.085656 5.616585 1.713120 1.061522
In [818]:
# Plotting all crimes to see the proportional division of types of crime in year 1999
crime_by_type_1999.plot.bar( stacked = True, edgecolor='black', figsize = (13,13), title = 'Crime Rate By Type In 1999'  )
plt.legend(fontsize = 12)
plt.ylabel ('Crime Rate')
plt.xticks(fontsize = 10)
Out[818]:
(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]),
 <a list of 32 Text xticklabel objects>)

Observations: As we can see on the above graph the biggest part of all offences is accounted to Theft and Handling crime type in year 1999.

In [819]:
#Plotting all crimes to see the proportional division of types of crime in year 2016
crime_by_type_2016.plot.bar( stacked = True, edgecolor='black', figsize = (13,13), title = 'Crime Rate By Type In 2016'  )
plt.legend(fontsize = 12)
plt.ylabel ('Crime Rate')
plt.xticks(fontsize = 10)
Out[819]:
(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]),
 <a list of 32 Text xticklabel objects>)

Observations: As demonstrated above in year 2016 the biggest part of all offences for majority of boroughs is still Theft and Handling however the proportion of it to other crime types is not as high as before. We can see a big increase in violence against the person crime type as a proportion of all crimes. The change in the rest of crime types is not very visible

In [820]:
# Checking the proportion of violence against the person in all offences in year 1999
proportion_of_violence1999 = (violence1999 /all_offences1999) * 100 
In [821]:
print(proportion_of_violence1999)
Borough
Barking and Dagenham      16.405172
Barnet                    13.820503
Bexley                    15.236944
Brent                     16.260187
Bromley                   12.177986
Camden                    11.663755
Croydon                   15.579048
Ealing                    17.172213
Enfield                   14.131804
Greenwich                 20.786236
Hackney                   17.061040
Hammersmith and Fulham    16.374126
Haringey                  14.303742
Harrow                    13.959588
Havering                  14.164050
Hillingdon                14.128077
Hounslow                  16.756896
Islington                 13.675280
Kensington and Chelsea    10.702102
Kingston upon Thames      16.533552
Lambeth                   15.787558
Lewisham                  17.048063
Merton                    17.291221
Newham                    18.938307
Redbridge                 14.732688
Richmond upon Thames      13.422343
Southwark                 17.101111
Sutton                    14.579452
Tower Hamlets             15.805230
Waltham Forest            16.885519
Wandsworth                14.926675
Westminster               10.382925
Name: Year 1999, dtype: float64
In [822]:
# Plotting proportion of violence against the person in all offences in year 1999
proportion_of_violence1999.plot.bar(title = 'Proportion of violence against the person in all offences (year 1999)'  )
Out[822]:
<matplotlib.axes._subplots.AxesSubplot at 0x20e61996cc0>
In [826]:
# Average proportion of violence against the person in all offences among all boroughs in year 1999 
print(statistics.mean(proportion_of_violence1999))
15.243543499421966
In [444]:
# Proportion of violence against the person in all offences in year 2016
proportion_of_violence2016 = (violence2016/all_offences2016) * 100 
In [445]:
print(proportion_of_violence2016)
Borough
Barking and Dagenham      34.793005
Barnet                    30.238062
Bexley                    34.542686
Brent                     33.816758
Bromley                   32.621934
Camden                    25.454904
Croydon                   34.930364
Ealing                    33.735855
Enfield                   32.292150
Greenwich                 36.358188
Hackney                   30.573938
Hammersmith and Fulham    29.315333
Haringey                  30.781786
Harrow                    31.753471
Havering                  34.870572
Hillingdon                33.649388
Hounslow                  32.447386
Islington                 28.089791
Kensington and Chelsea    23.305824
Kingston upon Thames      31.559675
Lambeth                   30.180951
Lewisham                  35.993028
Merton                    31.103618
Newham                    30.829469
Redbridge                 31.480281
Richmond upon Thames      27.321063
Southwark                 28.766029
Sutton                    34.410020
Tower Hamlets             31.824649
Waltham Forest            34.320353
Wandsworth                27.944639
Westminster               22.770783
Name: Year 2016, dtype: float64
In [482]:
# Plotting proportion of violence against the person in all offences in year 1999
proportion_of_violence2016.plot.bar(title = 'Proportion of violence against the person in all offences (year 2016)')
Out[482]:
<matplotlib.axes._subplots.AxesSubplot at 0x20e3f210a58>
In [827]:
#  Average proportion of violence against the person in all offences among all boroughs in year 2016
print(statistics.mean(proportion_of_violence2016))
31.314873477784044

Observations: The proportion of violence against the person in all offences has gone dramatically up between years 1999 and 2016.

Now lets see how the violence against the person changed from year to year in the period between 1999 and 2016

In [828]:
#  Setting year as the index for the ‘violence’ data frame in order to convert it to a time series data frame. This will help with analysis of the trends in violence rate. 
violence_time_series = violence.T  
In [829]:
violence_time_series.head()
Out[829]:
Borough Barking and Dagenham Barnet Bexley Brent Bromley Camden Croydon Ealing Enfield Greenwich ... Merton Newham Redbridge Richmond upon Thames Southwark Sutton Tower Hamlets Waltham Forest Wandsworth Westminster
Year 1999 19.760656 13.538819 14.490154 20.770829 10.932445 28.115346 17.707323 22.008126 14.813131 29.382376 ... 15.708249 30.184977 15.205469 12.044079 30.175144 11.779791 27.828451 20.650782 20.070396 48.073011
Year 2000 21.989957 14.484584 14.671928 21.796977 12.867529 27.154465 18.929455 21.125604 17.442960 29.416428 ... 15.579502 29.918969 15.646105 11.513995 29.446911 11.391528 30.258759 21.708481 18.162461 43.175317
Year 2001 23.790551 15.155831 15.263512 21.096358 12.973553 27.018221 18.519182 22.266627 18.310936 28.773108 ... 16.498697 30.271319 15.382835 12.133486 30.228427 13.507035 31.776816 22.399387 18.407166 42.955014
Year 2002 26.401053 16.365519 16.223765 23.333370 15.937007 28.518620 22.783716 23.098415 19.921993 31.118496 ... 18.156916 31.713948 18.688107 14.517701 32.079573 16.908945 36.410006 24.251931 19.474969 47.563671
Year 2003 29.492810 18.076333 18.412297 27.470623 16.834145 31.426557 23.648558 23.206148 18.244718 30.561339 ... 19.096874 32.042276 18.923900 14.092854 34.915884 17.730751 37.018754 26.780894 19.275507 45.296646

5 rows × 32 columns

In [830]:
# Converting to a datetime index. Even though the date might be not exactly the same as in original data set, it covers exactly the same periods 
violence_time_series.index.name = 'Year'
rng_v = pd.date_range(start='01.01.1999', periods = 18, freq = 'Y' )
rng_v
Out[830]:
DatetimeIndex(['1999-12-31', '2000-12-31', '2001-12-31', '2002-12-31',
               '2003-12-31', '2004-12-31', '2005-12-31', '2006-12-31',
               '2007-12-31', '2008-12-31', '2009-12-31', '2010-12-31',
               '2011-12-31', '2012-12-31', '2013-12-31', '2014-12-31',
               '2015-12-31', '2016-12-31'],
              dtype='datetime64[ns]', freq='A-DEC')
In [831]:
violence_time_series.set_index(rng_v,inplace = True  )
In [832]:
violence_time_series.head()
Out[832]:
Borough Barking and Dagenham Barnet Bexley Brent Bromley Camden Croydon Ealing Enfield Greenwich ... Merton Newham Redbridge Richmond upon Thames Southwark Sutton Tower Hamlets Waltham Forest Wandsworth Westminster
1999-12-31 19.760656 13.538819 14.490154 20.770829 10.932445 28.115346 17.707323 22.008126 14.813131 29.382376 ... 15.708249 30.184977 15.205469 12.044079 30.175144 11.779791 27.828451 20.650782 20.070396 48.073011
2000-12-31 21.989957 14.484584 14.671928 21.796977 12.867529 27.154465 18.929455 21.125604 17.442960 29.416428 ... 15.579502 29.918969 15.646105 11.513995 29.446911 11.391528 30.258759 21.708481 18.162461 43.175317
2001-12-31 23.790551 15.155831 15.263512 21.096358 12.973553 27.018221 18.519182 22.266627 18.310936 28.773108 ... 16.498697 30.271319 15.382835 12.133486 30.228427 13.507035 31.776816 22.399387 18.407166 42.955014
2002-12-31 26.401053 16.365519 16.223765 23.333370 15.937007 28.518620 22.783716 23.098415 19.921993 31.118496 ... 18.156916 31.713948 18.688107 14.517701 32.079573 16.908945 36.410006 24.251931 19.474969 47.563671
2003-12-31 29.492810 18.076333 18.412297 27.470623 16.834145 31.426557 23.648558 23.206148 18.244718 30.561339 ... 19.096874 32.042276 18.923900 14.092854 34.915884 17.730751 37.018754 26.780894 19.275507 45.296646

5 rows × 32 columns

In [833]:
# Displaying the change in Violence against the person rate by borough between years 1999 and 2016
uv = violence_time_series.plot(colormap='tab10', fontsize=10, linewidth=4, figsize=(20,12) )

# Set labels and legend
uv.set_xlabel('Year', fontsize=15)
uv.set_ylabel('Violence Rate', fontsize=15)
uv.set_title('Violence against the person rate in London by borough (Years 1999-2016 )', fontsize=20)
uv.legend(loc='center left', bbox_to_anchor=(1.0, 0.5), fontsize=15)
Out[833]:
<matplotlib.legend.Legend at 0x20e6177e940>

The above graph looks messy and it is difficult to see any trends. Lets separate it in to separate graphs for a clear picture

In [834]:
uvv = violence_time_series.plot(subplots=True,
                      layout=(6, 6),
                      sharex=True,
                      sharey=False,
                      linewidth=0.7,
                      fontsize=3,
                      figsize=(15,15),
                      legend=True)

plt.show()

Observations: as per above time series it is very difficult to see any trends in change in violence. There are a lot of ups and downs year by year. It shows a more complex behaviour than 'all offences' patterns. Interestingly for most of boroughs there was a drop in violence around year 2014 and then sharp rise between 2014 and 2016

Instead of concentrating on each borough lets compare how average violence rate changed in comparison to average all offences rate in London.

In [835]:
# Finding and plotting London average all offences rate year-on-year.   
plt.bar(height = np.mean(all_offences), x = all_offences.columns )

plt.xticks(rotation=90)
plt.title('Average all offences rate in London by year ')
Out[835]:
Text(0.5, 1.0, 'Average all offences rate in London by year ')

Observation: As previously we can see a clear trend to decrease in all offences

In [836]:
# Finding and plotting London average violence rate year-on-year. 
plt.bar(height = np.mean(violence), x = violence.columns )

plt.xticks(rotation=90)
plt.title('Average violence rate in London by year ')
Out[836]:
Text(0.5, 1.0, 'Average violence rate in London by year ')

Observations: Even though average violence rate changes year to year in the long run it stays approximately the same.

In [837]:
# Fitting simple linear regression model at a year-on-year basis to test the the change in all offences rates. 
# We use OLS methodology to fit these models.


a_coefs = [0]*17

for year_index in range(17):

  X = all_offences.iloc[:,year_index]
  y = all_offences.iloc[:,year_index + 1]

  model = sm.OLS(y, X).fit()
  #print(model.summary())
  print((model.params)[0])
  print((model.pvalues)[0])
  print('\t')

  a_coefs[year_index] = (model.params)[0]
0.9527834282522719
2.472031616926354e-46
	
1.0365569094518956
1.6965888242450494e-41
	
1.0076494535149665
4.050837313267017e-46
	
0.9646264040165837
2.6981204584485847e-40
	
0.9469412139948065
4.859110552151237e-43
	
0.9410471276692942
2.031572322008197e-40
	
0.9316059821945715
5.42581070074363e-43
	
0.9266124146961573
5.725340268797313e-40
	
0.9650310783082783
3.702201247187692e-43
	
0.9709883854702102
5.0965010382496606e-45
	
0.9846385271113245
9.466917291425185e-48
	
0.9808539632891708
6.125443858135701e-43
	
0.9334148108908799
1.4781681666230585e-43
	
0.8863225177790681
5.151955851934811e-47
	
0.9772747605411831
1.297070266637168e-37
	
1.0218232459553334
3.281455220331373e-49
	
0.8541698340493351
5.549362785121011e-48
	
In [838]:
np.mean(a_coefs)
Out[838]:
0.9577847092461959

Observations: The vast majority of fitted coefficients are clearly less than one which implies a decreasing all offences rate. This conclusion is further validated by the highly significant p-values which are all < 0.0001

In [842]:
# Fitting simple linear regression models at a year-on-year basis to test the the change in violence rates. 
# We use OLS methodology to fit these models.
v_coefs = [0]*17
for year_index in range(17):

  X = violence.iloc[:,year_index]
  y = violence.iloc[:,year_index + 1]

  model = sm.OLS(y, X).fit()
  #print(model.summary())
  print((model.params)[0])
  print((model.pvalues)[0])
  print('\t')

  v_coefs[year_index] = (model.params)[0]
0.9888693271860681
9.255504643857128e-38
	
1.019055916486212
7.785707687924748e-43
	
1.0930505718097407
1.4599057766026831e-40
	
1.0329193840022564
4.069933224811548e-39
	
1.0728699326843547
4.584425323523383e-38
	
0.9604515793809032
4.003066730267187e-37
	
0.9140234484800942
8.915109519433956e-37
	
0.9358853175735171
3.267071305289358e-37
	
0.9901464497541195
7.808862133474935e-39
	
0.97909607100088
3.0834943365433488e-40
	
0.9387197025968659
3.184017534453766e-38
	
0.908422249750459
4.552162892098322e-42
	
0.9499529896303698
4.467809448047844e-37
	
1.0085852645013667
2.0229421940619694e-40
	
1.2559345538997544
4.99147502705328e-40
	
1.1272727952465766
7.517014324590081e-45
	
0.8625802403997657
4.022144897268634e-45
	
In [843]:
np.mean(v_coefs)
Out[843]:
1.0022256349637237

Observations: Some of the coefficients are < 1 and some > 1 which means that there is mix of increase and decrease in violence rate year-to-year. However the average coefficient is a bit more than one so we can conclude that there was a slight increase in violene rate betweet years 1999 and 2016.

In [844]:
# Difference in all offences coefficients and violence against the person coefficients
diffs = [a_coefs[i] - v_coefs[i] for i in range(len(v_coefs))]
plt.bar(height = diffs, x = all_offences.columns[0:17])
plt.xticks(rotation=90)
Out[844]:
([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
 <a list of 17 Text xticklabel objects>)

Part 2: Relationship between violent crime and socioeconomic factors in London

Loading datasets to extract the required socioeconomic factors

Proportion of pubs/bars and night clubs/social clubs to all food/drinks business (as indicator of alcohol availability).

In [845]:
# Loading pubs and bars dataset
pubs_bars = pd.read_excel('pubs.xlsx', sheet_name = 'Pubs units', header = [3])
In [846]:
pubs_bars.head()
Out[846]:
Unnamed: 0 Unnamed: 1 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 E12000007 London 4835.0 4875.0 4530.0 4455.0 4445.0 4335.0 4295.0 4210.0 4025.0 3890.0 3770.0 3770.0 3665.0 3615.0 3675.0 3615.0 3530.0
2 E09000001 City of London 200.0 195.0 180.0 180.0 200.0 185.0 200.0 195.0 185.0 190.0 175.0 165.0 155.0 160.0 150.0 150.0 160.0
3 E09000002 Barking and Dagenham 45.0 40.0 45.0 40.0 40.0 40.0 30.0 30.0 25.0 25.0 25.0 30.0 20.0 20.0 25.0 20.0 20.0
4 E09000003 Barnet 135.0 135.0 130.0 120.0 125.0 120.0 115.0 120.0 110.0 95.0 95.0 95.0 85.0 90.0 95.0 100.0 80.0
In [847]:
# The data frame does not contain any missing values in the rows that we need for our analysis. There are missing values that are result of loading csv file in Python. Those values will be removed 
pubs_bars.isnull().sum
Out[847]:
<bound method DataFrame.sum of     Unnamed: 0  Unnamed: 1   2001   2002   2003   2004   2005   2006   2007  \
0         True        True   True   True   True   True   True   True   True   
1        False       False  False  False  False  False  False  False  False   
2        False       False  False  False  False  False  False  False  False   
3        False       False  False  False  False  False  False  False  False   
4        False       False  False  False  False  False  False  False  False   
5        False       False  False  False  False  False  False  False  False   
6        False       False  False  False  False  False  False  False  False   
7        False       False  False  False  False  False  False  False  False   
8        False       False  False  False  False  False  False  False  False   
9        False       False  False  False  False  False  False  False  False   
10       False       False  False  False  False  False  False  False  False   
11       False       False  False  False  False  False  False  False  False   
12       False       False  False  False  False  False  False  False  False   
13       False       False  False  False  False  False  False  False  False   
14       False       False  False  False  False  False  False  False  False   
15       False       False  False  False  False  False  False  False  False   
16       False       False  False  False  False  False  False  False  False   
17       False       False  False  False  False  False  False  False  False   
18       False       False  False  False  False  False  False  False  False   
19       False       False  False  False  False  False  False  False  False   
20       False       False  False  False  False  False  False  False  False   
21       False       False  False  False  False  False  False  False  False   
22       False       False  False  False  False  False  False  False  False   
23       False       False  False  False  False  False  False  False  False   
24       False       False  False  False  False  False  False  False  False   
25       False       False  False  False  False  False  False  False  False   
26       False       False  False  False  False  False  False  False  False   
27       False       False  False  False  False  False  False  False  False   
28       False       False  False  False  False  False  False  False  False   
29       False       False  False  False  False  False  False  False  False   
30       False       False  False  False  False  False  False  False  False   
31       False       False  False  False  False  False  False  False  False   
32       False       False  False  False  False  False  False  False  False   
33       False       False  False  False  False  False  False  False  False   
34       False       False  False  False  False  False  False  False  False   
35        True        True   True   True   True   True   True   True   True   
36       False        True   True   True   True   True   True   True   True   
37        True        True   True   True   True   True   True   True   True   
38       False        True   True   True   True   True   True   True   True   
39        True        True   True   True   True   True   True   True   True   
40       False        True   True   True   True   True   True   True   True   
41       False        True   True   True   True   True   True   True   True   
42       False        True   True   True   True   True   True   True   True   
43       False        True   True   True   True   True   True   True   True   
44       False        True   True   True   True   True   True   True   True   
45       False        True   True   True   True   True   True   True   True   
46       False        True   True   True   True   True   True   True   True   
47       False        True   True   True   True   True   True   True   True   
48       False        True   True   True   True   True   True   True   True   
49       False        True   True   True   True   True   True   True   True   
50       False        True   True   True   True   True   True   True   True   
51       False        True   True   True   True   True   True   True   True   
52       False        True   True   True   True   True   True   True   True   
53       False        True   True   True   True   True   True   True   True   
54       False        True   True   True   True   True   True   True   True   

     2008   2009   2010   2011   2012   2013   2014   2015   2016   2017  
0    True   True   True   True   True   True   True   True   True   True  
1   False  False  False  False  False  False  False  False  False  False  
2   False  False  False  False  False  False  False  False  False  False  
3   False  False  False  False  False  False  False  False  False  False  
4   False  False  False  False  False  False  False  False  False  False  
5   False  False  False  False  False  False  False  False  False  False  
6   False  False  False  False  False  False  False  False  False  False  
7   False  False  False  False  False  False  False  False  False  False  
8   False  False  False  False  False  False  False  False  False  False  
9   False  False  False  False  False  False  False  False  False  False  
10  False  False  False  False  False  False  False  False  False  False  
11  False  False  False  False  False  False  False  False  False  False  
12  False  False  False  False  False  False  False  False  False  False  
13  False  False  False  False  False  False  False  False  False  False  
14  False  False  False  False  False  False  False  False  False  False  
15  False  False  False  False  False  False  False  False  False  False  
16  False  False  False  False  False  False  False  False  False  False  
17  False  False  False  False  False  False  False  False  False  False  
18  False  False  False  False  False  False  False  False  False  False  
19  False  False  False  False  False  False  False  False  False  False  
20  False  False  False  False  False  False  False  False  False  False  
21  False  False  False  False  False  False  False  False  False  False  
22  False  False  False  False  False  False  False  False  False  False  
23  False  False  False  False  False  False  False  False  False  False  
24  False  False  False  False  False  False  False  False  False  False  
25  False  False  False  False  False  False  False  False  False  False  
26  False  False  False  False  False  False  False  False  False  False  
27  False  False  False  False  False  False  False  False  False  False  
28  False  False  False  False  False  False  False  False  False  False  
29  False  False  False  False  False  False  False  False  False  False  
30  False  False  False  False  False  False  False  False  False  False  
31  False  False  False  False  False  False  False  False  False  False  
32  False  False  False  False  False  False  False  False  False  False  
33  False  False  False  False  False  False  False  False  False  False  
34  False  False  False  False  False  False  False  False  False  False  
35   True   True   True   True   True   True   True   True   True   True  
36   True   True   True   True   True   True   True   True   True   True  
37   True   True   True   True   True   True   True   True   True   True  
38   True   True   True   True   True   True   True   True   True   True  
39   True   True   True   True   True   True   True   True   True   True  
40   True   True   True   True   True   True   True   True   True   True  
41   True   True   True   True   True   True   True   True   True   True  
42   True   True   True   True   True   True   True   True   True   True  
43   True   True   True   True   True   True   True   True   True   True  
44   True   True   True   True   True   True   True   True   True   True  
45   True   True   True   True   True   True   True   True   True   True  
46   True   True   True   True   True   True   True   True   True   True  
47   True   True   True   True   True   True   True   True   True   True  
48   True   True   True   True   True   True   True   True   True   True  
49   True   True   True   True   True   True   True   True   True   True  
50   True   True   True   True   True   True   True   True   True   True  
51   True   True   True   True   True   True   True   True   True   True  
52   True   True   True   True   True   True   True   True   True   True  
53   True   True   True   True   True   True   True   True   True   True  
54   True   True   True   True   True   True   True   True   True   True  >
In [848]:
# Removing the missing values. 
pubs_bars.dropna(inplace=True)
In [849]:
# Doing all the necessary transformations.
pubs_bars.drop('Unnamed: 0', axis=1, inplace=True)
pubs_bars.set_index('Unnamed: 1',inplace = True  )
pubs_bars.index.name = 'Borough'
pubs_bars.head()
Out[849]:
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Borough
London 4835.0 4875.0 4530.0 4455.0 4445.0 4335.0 4295.0 4210.0 4025.0 3890.0 3770.0 3770.0 3665.0 3615.0 3675.0 3615.0 3530.0
City of London 200.0 195.0 180.0 180.0 200.0 185.0 200.0 195.0 185.0 190.0 175.0 165.0 155.0 160.0 150.0 150.0 160.0
Barking and Dagenham 45.0 40.0 45.0 40.0 40.0 40.0 30.0 30.0 25.0 25.0 25.0 30.0 20.0 20.0 25.0 20.0 20.0
Barnet 135.0 135.0 130.0 120.0 125.0 120.0 115.0 120.0 110.0 95.0 95.0 95.0 85.0 90.0 95.0 100.0 80.0
Bexley 95.0 95.0 95.0 100.0 105.0 105.0 105.0 105.0 90.0 95.0 95.0 95.0 95.0 85.0 90.0 90.0 95.0
In [850]:
pubs_bars.shape
Out[850]:
(34, 17)
In [851]:
pubs_bars.head()
Out[851]:
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Borough
London 4835.0 4875.0 4530.0 4455.0 4445.0 4335.0 4295.0 4210.0 4025.0 3890.0 3770.0 3770.0 3665.0 3615.0 3675.0 3615.0 3530.0
City of London 200.0 195.0 180.0 180.0 200.0 185.0 200.0 195.0 185.0 190.0 175.0 165.0 155.0 160.0 150.0 150.0 160.0
Barking and Dagenham 45.0 40.0 45.0 40.0 40.0 40.0 30.0 30.0 25.0 25.0 25.0 30.0 20.0 20.0 25.0 20.0 20.0
Barnet 135.0 135.0 130.0 120.0 125.0 120.0 115.0 120.0 110.0 95.0 95.0 95.0 85.0 90.0 95.0 100.0 80.0
Bexley 95.0 95.0 95.0 100.0 105.0 105.0 105.0 105.0 90.0 95.0 95.0 95.0 95.0 85.0 90.0 90.0 95.0
In [852]:
# Loading licensed restaurants dataset
licensed_restaurants = pd.read_excel('licensed-restaurants-cafes-borough.xlsx', sheet_name = 'Restaurants units', header = [3])
In [853]:
licensed_restaurants.head()
Out[853]:
Unnamed: 0 Unnamed: 1 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 E12000007 London 5455.0 5525.0 5540.0 5575.0 5915.0 5940.0 6175.0 5985.0 5910.0 6175.0 6210.0 6655.0 6715.0 7035.0 7515.0 7745.0 7990.0
2 E09000001 City of London 180.0 190.0 190.0 175.0 180.0 170.0 205.0 185.0 180.0 195.0 190.0 205.0 230.0 255.0 270.0 270.0 295.0
3 E09000002 Barking and Dagenham 30.0 30.0 30.0 35.0 35.0 35.0 35.0 35.0 30.0 30.0 30.0 40.0 40.0 35.0 45.0 40.0 35.0
4 E09000003 Barnet 205.0 195.0 200.0 200.0 200.0 200.0 205.0 195.0 210.0 215.0 205.0 235.0 250.0 240.0 255.0 245.0 270.0
In [854]:
# Doing all the necessary transformations as per ‘pubs_bars’ data frame 
licensed_restaurants.drop('Unnamed: 0', axis=1, inplace=True)
licensed_restaurants.set_index('Unnamed: 1',inplace = True  )
licensed_restaurants.index.name = 'Borough'
In [855]:
# The data frame does not contain any missing values in the rows that we need for our analysis. There are missing values that are result of loading csv file in Python. Those values will be removed 
licensed_restaurants.isnull()
Out[855]:
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Borough
NaN True True True True True True True True True True True True True True True True True
London False False False False False False False False False False False False False False False False False
City of London False False False False False False False False False False False False False False False False False
Barking and Dagenham False False False False False False False False False False False False False False False False False
Barnet False False False False False False False False False False False False False False False False False
Bexley False False False False False False False False False False False False False False False False False
Brent False False False False False False False False False False False False False False False False False
Bromley False False False False False False False False False False False False False False False False False
Camden False False False False False False False False False False False False False False False False False
Croydon False False False False False False False False False False False False False False False False False
Ealing False False False False False False False False False False False False False False False False False
Enfield False False False False False False False False False False False False False False False False False
Greenwich False False False False False False False False False False False False False False False False False
Hackney False False False False False False False False False False False False False False False False False
Hammersmith and Fulham False False False False False False False False False False False False False False False False False
Haringey False False False False False False False False False False False False False False False False False
Harrow False False False False False False False False False False False False False False False False False
Havering False False False False False False False False False False False False False False False False False
Hillingdon False False False False False False False False False False False False False False False False False
Hounslow False False False False False False False False False False False False False False False False False
Islington False False False False False False False False False False False False False False False False False
Kensington and Chelsea False False False False False False False False False False False False False False False False False
Kingston upon Thames False False False False False False False False False False False False False False False False False
Lambeth False False False False False False False False False False False False False False False False False
Lewisham False False False False False False False False False False False False False False False False False
Merton False False False False False False False False False False False False False False False False False
Newham False False False False False False False False False False False False False False False False False
Redbridge False False False False False False False False False False False False False False False False False
Richmond upon Thames False False False False False False False False False False False False False False False False False
Southwark False False False False False False False False False False False False False False False False False
Sutton False False False False False False False False False False False False False False False False False
Tower Hamlets False False False False False False False False False False False False False False False False False
Waltham Forest False False False False False False False False False False False False False False False False False
Wandsworth False False False False False False False False False False False False False False False False False
Westminster False False False False False False False False False False False False False False False False False
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
NaN True True True True True True True True True True True True True True True True True
In [856]:
# Those rows that we need to use do not have any missing values. We are removing the rest rows with missing values. 
licensed_restaurants.dropna(inplace=True)
In [857]:
licensed_restaurants.head()
Out[857]:
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Borough
London 5455.0 5525.0 5540.0 5575.0 5915.0 5940.0 6175.0 5985.0 5910.0 6175.0 6210.0 6655.0 6715.0 7035.0 7515.0 7745.0 7990.0
City of London 180.0 190.0 190.0 175.0 180.0 170.0 205.0 185.0 180.0 195.0 190.0 205.0 230.0 255.0 270.0 270.0 295.0
Barking and Dagenham 30.0 30.0 30.0 35.0 35.0 35.0 35.0 35.0 30.0 30.0 30.0 40.0 40.0 35.0 45.0 40.0 35.0
Barnet 205.0 195.0 200.0 200.0 200.0 200.0 205.0 195.0 210.0 215.0 205.0 235.0 250.0 240.0 255.0 245.0 270.0
Bexley 85.0 80.0 75.0 80.0 85.0 95.0 95.0 95.0 90.0 95.0 95.0 105.0 105.0 105.0 100.0 110.0 110.0
In [858]:
licensed_restaurants.shape
Out[858]:
(34, 17)
In [859]:
# Loading unlicensed restaurants dataset
unlicensed_restaurants = pd.read_excel('unlicensed-restaurants.xlsx', sheet_name = 'Unlicensed Restaurants units', header = [3])
In [860]:
# Doing all the necessary transformations as per ‘pubs_bars’ data frame 
unlicensed_restaurants.drop('Unnamed: 0', axis=1, inplace=True)
unlicensed_restaurants.dropna(inplace=True)
unlicensed_restaurants.set_index('Unnamed: 1',inplace = True  )
unlicensed_restaurants.index.name = 'Borough'
In [861]:
unlicensed_restaurants.head()
Out[861]:
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Borough
London 2280.0 2435.0 2905.0 3060.0 3285.0 3440.0 3480.0 3730.0 3760.0 3390.0 3385.0 3740.0 3945.0 4535.0 5295.0 5520.0 5785.0
City of London 50.0 45.0 85.0 110.0 115.0 110.0 115.0 130.0 120.0 125.0 125.0 140.0 135.0 160.0 180.0 180.0 190.0
Barking and Dagenham 25.0 25.0 25.0 35.0 40.0 40.0 35.0 35.0 35.0 30.0 30.0 35.0 45.0 45.0 60.0 60.0 65.0
Barnet 130.0 145.0 155.0 160.0 170.0 210.0 200.0 210.0 200.0 190.0 185.0 180.0 190.0 210.0 245.0 200.0 215.0
Bexley 30.0 35.0 40.0 40.0 45.0 45.0 45.0 50.0 50.0 50.0 50.0 50.0 55.0 60.0 75.0 75.0 85.0
In [862]:
unlicensed_restaurants.shape
Out[862]:
(34, 17)
In [863]:
## Loading take away restaurants dataset
take_away = pd.read_excel('take.away.xlsx', sheet_name = 'Takeaway and food stand units', header = [3])
In [864]:
take_away.head()
Out[864]:
Unnamed: 0 Unnamed: 1 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 E12000007 London 3970.0 4015.0 3935.0 4075.0 4175.0 4240.0 4240.0 4240.0 4255.0 4210.0 4150.0 4400.0 4390.0 4585.0 5190.0 5275.0 5300.0
2 E09000001 City of London 120.0 135.0 130.0 135.0 145.0 145.0 145.0 140.0 135.0 160.0 160.0 160.0 160.0 155.0 150.0 160.0 155.0
3 E09000002 Barking and Dagenham 90.0 95.0 95.0 90.0 90.0 90.0 85.0 95.0 95.0 95.0 95.0 95.0 95.0 105.0 125.0 130.0 135.0
4 E09000003 Barnet 120.0 115.0 120.0 130.0 135.0 140.0 120.0 135.0 140.0 135.0 135.0 140.0 145.0 155.0 165.0 180.0 180.0
In [865]:
# Doing all the necessary transformations as per ‘pubs_bars’ data frame 
take_away.drop('Unnamed: 0', axis=1, inplace=True)
take_away.dropna(inplace=True)
take_away.set_index('Unnamed: 1',inplace = True  )
take_away.index.name = 'Borough'
In [866]:
take_away.head()
Out[866]:
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Borough
London 3970.0 4015.0 3935.0 4075.0 4175.0 4240.0 4240.0 4240.0 4255.0 4210.0 4150.0 4400.0 4390.0 4585.0 5190.0 5275.0 5300.0
City of London 120.0 135.0 130.0 135.0 145.0 145.0 145.0 140.0 135.0 160.0 160.0 160.0 160.0 155.0 150.0 160.0 155.0
Barking and Dagenham 90.0 95.0 95.0 90.0 90.0 90.0 85.0 95.0 95.0 95.0 95.0 95.0 95.0 105.0 125.0 130.0 135.0
Barnet 120.0 115.0 120.0 130.0 135.0 140.0 120.0 135.0 140.0 135.0 135.0 140.0 145.0 155.0 165.0 180.0 180.0
Bexley 105.0 115.0 105.0 105.0 115.0 115.0 120.0 100.0 105.0 100.0 100.0 110.0 100.0 105.0 125.0 135.0 125.0
In [867]:
take_away.shape
Out[867]:
(34, 17)
In [868]:
# Loading clubs dataset
licensed_clubs = pd.read_excel('licensed-clubs-borough.xlsx', sheet_name = 'Clubs units', header = [3])
In [869]:
licensed_clubs.head()
Out[869]:
Unnamed: 0 Unnamed: 1 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 E12000007 London 880.0 845.0 820.0 780.0 760.0 765.0 770.0 770.0 760.0 710.0 685.0 700.0 655.0 645.0 620.0 595.0 570.0
2 E09000001 City of London 5.0 5.0 5.0 10.0 5.0 10.0 10.0 10.0 15.0 15.0 10.0 10.0 10.0 10.0 10.0 15.0 15.0
3 E09000002 Barking and Dagenham 20.0 20.0 20.0 20.0 20.0 20.0 20.0 15.0 20.0 15.0 10.0 10.0 10.0 10.0 10.0 10.0 10.0
4 E09000003 Barnet 30.0 25.0 25.0 20.0 20.0 20.0 20.0 20.0 20.0 15.0 20.0 20.0 15.0 20.0 20.0 15.0 10.0
In [870]:
# Doing all the necessary transformations as per ‘pubs_bars’ data frame 
licensed_clubs.drop('Unnamed: 0', axis=1, inplace=True)
licensed_clubs.dropna(inplace=True)
licensed_clubs.set_index('Unnamed: 1',inplace = True  )
licensed_clubs.index.name = 'Borough'
In [871]:
licensed_clubs.head()
Out[871]:
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Borough
London 880.0 845.0 820.0 780.0 760.0 765.0 770.0 770.0 760.0 710.0 685.0 700.0 655.0 645.0 620.0 595.0 570.0
City of London 5.0 5.0 5.0 10.0 5.0 10.0 10.0 10.0 15.0 15.0 10.0 10.0 10.0 10.0 10.0 15.0 15.0
Barking and Dagenham 20.0 20.0 20.0 20.0 20.0 20.0 20.0 15.0 20.0 15.0 10.0 10.0 10.0 10.0 10.0 10.0 10.0
Barnet 30.0 25.0 25.0 20.0 20.0 20.0 20.0 20.0 20.0 15.0 20.0 20.0 15.0 20.0 20.0 15.0 10.0
Bexley 25.0 25.0 25.0 20.0 20.0 25.0 20.0 20.0 25.0 25.0 20.0 20.0 20.0 20.0 20.0 20.0 20.0
In [872]:
licensed_clubs.shape
Out[872]:
(34, 17)
In [873]:
# All restaurant businesses in the borough
all_food_alcohol_places =  pubs_bars + licensed_restaurants + unlicensed_restaurants + take_away + licensed_clubs
In [874]:
all_food_alcohol_places.head()
Out[874]:
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Borough
London 17420.0 17695.0 17730.0 17945.0 18580.0 18720.0 18960.0 18935.0 18710.0 18375.0 18200.0 19265.0 19370.0 20415.0 22295.0 22750.0 23175.0
City of London 555.0 570.0 590.0 610.0 645.0 620.0 675.0 660.0 635.0 685.0 660.0 680.0 690.0 740.0 760.0 775.0 815.0
Barking and Dagenham 210.0 210.0 215.0 220.0 225.0 225.0 205.0 210.0 205.0 195.0 190.0 210.0 210.0 215.0 265.0 260.0 265.0
Barnet 620.0 615.0 630.0 630.0 650.0 690.0 660.0 680.0 680.0 650.0 640.0 670.0 685.0 715.0 780.0 740.0 755.0
Bexley 340.0 350.0 340.0 345.0 370.0 385.0 385.0 370.0 360.0 365.0 360.0 380.0 375.0 375.0 410.0 430.0 435.0
In [875]:
# Proportion of pubs and bars to all food/drink businesses
proportion_of_pubs_bars = (pubs_bars/all_food_alcohol_places) * 100
In [876]:
proportion_of_pubs_bars.head()
Out[876]:
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Borough
London 27.755454 27.550155 25.549915 24.825857 23.923574 23.157051 22.652954 22.233958 21.512560 21.170068 20.714286 19.569167 18.921012 17.707568 16.483516 15.890110 15.231931
City of London 36.036036 34.210526 30.508475 29.508197 31.007752 29.838710 29.629630 29.545455 29.133858 27.737226 26.515152 24.264706 22.463768 21.621622 19.736842 19.354839 19.631902
Barking and Dagenham 21.428571 19.047619 20.930233 18.181818 17.777778 17.777778 14.634146 14.285714 12.195122 12.820513 13.157895 14.285714 9.523810 9.302326 9.433962 7.692308 7.547170
Barnet 21.774194 21.951220 20.634921 19.047619 19.230769 17.391304 17.424242 17.647059 16.176471 14.615385 14.843750 14.179104 12.408759 12.587413 12.179487 13.513514 10.596026
Bexley 27.941176 27.142857 27.941176 28.985507 28.378378 27.272727 27.272727 28.378378 25.000000 26.027397 26.388889 25.000000 25.333333 22.666667 21.951220 20.930233 21.839080
In [877]:
# Removing the rows and columns that are not needed for our analysis
p_b= proportion_of_pubs_bars.iloc[2:34,1:16]
In [878]:
p_b.reset_index(inplace =True)
p_b.head()
Out[878]:
Borough 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
0 Barking and Dagenham 19.047619 20.930233 18.181818 17.777778 17.777778 14.634146 14.285714 12.195122 12.820513 13.157895 14.285714 9.523810 9.302326 9.433962 7.692308
1 Barnet 21.951220 20.634921 19.047619 19.230769 17.391304 17.424242 17.647059 16.176471 14.615385 14.843750 14.179104 12.408759 12.587413 12.179487 13.513514
2 Bexley 27.142857 27.941176 28.985507 28.378378 27.272727 27.272727 28.378378 25.000000 26.027397 26.388889 25.000000 25.333333 22.666667 21.951220 20.930233
3 Brent 24.731183 22.340426 21.649485 21.428571 19.801980 19.000000 20.202020 19.191919 18.750000 19.791667 17.647059 17.708333 15.000000 14.912281 12.295082
4 Bromley 27.551020 25.742574 24.000000 24.038462 22.857143 22.429907 22.935780 22.935780 21.698113 20.588235 19.266055 17.699115 17.857143 17.741935 15.748031
In [879]:
p_b.shape
Out[879]:
(32, 16)
In [880]:
# Melting data frame for the future concating with other data frames
pubs_bars_new = pd.melt(p_b, id_vars = ['Borough'], var_name = 'Year', value_name = 'Proportion of pubs/bars')
In [881]:
pubs_bars_new.head(35)
Out[881]:
Borough Year Proportion of pubs/bars
0 Barking and Dagenham 2002 19.047619
1 Barnet 2002 21.951220
2 Bexley 2002 27.142857
3 Brent 2002 24.731183
4 Bromley 2002 27.551020
5 Camden 2002 25.108225
6 Croydon 2002 29.508197
7 Ealing 2002 28.301887
8 Enfield 2002 29.069767
9 Greenwich 2002 32.857143
10 Hackney 2002 36.046512
11 Hammersmith and Fulham 2002 26.732673
12 Haringey 2002 30.000000
13 Harrow 2002 23.880597
14 Havering 2002 29.032258
15 Hillingdon 2002 30.434783
16 Hounslow 2002 30.769231
17 Islington 2002 35.099338
18 Kensington and Chelsea 2002 19.594595
19 Kingston upon Thames 2002 29.850746
20 Lambeth 2002 31.496063
21 Lewisham 2002 30.434783
22 Merton 2002 22.580645
23 Newham 2002 30.769231
24 Redbridge 2002 21.212121
25 Richmond upon Thames 2002 31.372549
26 Southwark 2002 34.645669
27 Sutton 2002 27.272727
28 Tower Hamlets 2002 35.087719
29 Waltham Forest 2002 30.158730
30 Wandsworth 2002 24.409449
31 Westminster 2002 20.675105
32 Barking and Dagenham 2003 20.930233
33 Barnet 2003 20.634921
34 Bexley 2003 27.941176
In [882]:
pubs_bars_new.shape
Out[882]:
(480, 3)
In [883]:
# Proportion of clubs to all food/drink businesses
proportion_of_clubs = (licensed_clubs/all_food_alcohol_places) * 100
In [884]:
proportion_of_clubs.head()
Out[884]:
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
Borough
London 5.051665 4.775360 4.624929 4.346615 4.090420 4.086538 4.061181 4.066543 4.061999 3.863946 3.763736 3.633532 3.381518 3.159442 2.780893 2.615385 2.459547
City of London 0.900901 0.877193 0.847458 1.639344 0.775194 1.612903 1.481481 1.515152 2.362205 2.189781 1.515152 1.470588 1.449275 1.351351 1.315789 1.935484 1.840491
Barking and Dagenham 9.523810 9.523810 9.302326 9.090909 8.888889 8.888889 9.756098 7.142857 9.756098 7.692308 5.263158 4.761905 4.761905 4.651163 3.773585 3.846154 3.773585
Barnet 4.838710 4.065041 3.968254 3.174603 3.076923 2.898551 3.030303 2.941176 2.941176 2.307692 3.125000 2.985075 2.189781 2.797203 2.564103 2.027027 1.324503
Bexley 7.352941 7.142857 7.352941 5.797101 5.405405 6.493506 5.194805 5.405405 6.944444 6.849315 5.555556 5.263158 5.333333 5.333333 4.878049 4.651163 4.597701
In [885]:
# Removing the rows and columns that are not needed for our analysis
clubs= proportion_of_clubs.iloc[2:34,1:16]
clubs.reset_index(inplace=True)
clubs.head()
Out[885]:
Borough 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
0 Barking and Dagenham 9.523810 9.302326 9.090909 8.888889 8.888889 9.756098 7.142857 9.756098 7.692308 5.263158 4.761905 4.761905 4.651163 3.773585 3.846154
1 Barnet 4.065041 3.968254 3.174603 3.076923 2.898551 3.030303 2.941176 2.941176 2.307692 3.125000 2.985075 2.189781 2.797203 2.564103 2.027027
2 Bexley 7.142857 7.352941 5.797101 5.405405 6.493506 5.194805 5.405405 6.944444 6.849315 5.555556 5.263158 5.333333 5.333333 4.878049 4.651163
3 Brent 6.451613 6.382979 5.154639 5.102041 3.960396 4.000000 4.040404 4.040404 4.166667 4.166667 3.921569 3.125000 3.000000 2.631579 2.459016
4 Bromley 7.142857 6.930693 6.000000 5.769231 4.761905 4.672897 4.587156 4.587156 4.716981 4.901961 4.587156 4.424779 4.464286 3.225806 3.149606
In [886]:
clubs.shape
Out[886]:
(32, 16)
In [887]:
#Melting data frame for the future concating with other data frames
clubs_new = pd.melt(clubs, id_vars = ['Borough'], var_name = 'Year', value_name = 'Proportion of clubs')
In [888]:
clubs_new.head()
Out[888]:
Borough Year Proportion of clubs
0 Barking and Dagenham 2002 9.523810
1 Barnet 2002 4.065041
2 Bexley 2002 7.142857
3 Brent 2002 6.451613
4 Bromley 2002 7.142857

Job Seekers Allowance Claimants (as indicator of unemployment)

In [889]:
# Importing Job Seekers Allowance Claimants file (Rates are as a percentage of all people working age from ONS mid-year estimates)
job_seekers_df = pd.read_excel('job-seekers-allowance-borough.xls', sheet_name = 'Rates')
In [890]:
job_seekers_df.head()
Out[890]:
Code Area 1999-08-01 00:00:00 1999-11-01 00:00:00 2000-02-01 00:00:00 2000-05-01 00:00:00 2000-08-01 00:00:00 2000-11-01 00:00:00 2001-02-01 00:00:00 2001-05-01 00:00:00 ... 2016-08-01 00:00:00 2016-11-01 00:00:00 2017-02-01 00:00:00 2017-05-01 00:00:00 2017-08-01 00:00:00 2017-11-01 00:00:00 2018-02-01 00:00:00 2018-05-01 00:00:00 2018-08-01 00:00:00 2018-11-01 00:00:00
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 E09000001 City of London 3.024803 2.621496 2.599814 2.228412 2.042711 2.042711 1.919051 1.919051 ... 0.727590 0.582072 0.796813 0.597610 0.597610 0.597610 0.597610 0.398406 0.398406 0.398406
2 E09000002 Barking and Dagenham 3.722134 3.481349 3.615886 3.398538 3.270105 2.953962 3.109531 3.002974 ... 1.829053 1.759887 1.784375 1.761883 1.686910 1.701904 1.889339 1.521967 1.147099 0.899685
3 E09000003 Barnet 2.826175 2.526876 2.555127 2.283614 2.259372 2.138161 2.126637 2.140910 ... 1.042503 0.990578 1.064502 1.020482 1.028485 1.008476 1.120529 1.040491 0.720340 0.512242
4 E09000004 Bexley 2.520525 2.265558 2.325143 2.013676 1.962972 1.905024 1.894885 1.757469 ... 0.947677 0.889258 0.935985 0.948895 0.903710 0.897255 0.994081 0.955350 0.903710 0.735878

5 rows × 80 columns

In [891]:
# Removing the rows and columns that are not needed for our analysis. Since 4 sets of figures per year are given to make it consistent with other data frames we only keep figures for November of each year 
job_seekers = job_seekers_df.iloc[2:34,[1,15,19,23,27,31,35,39,43,47,51,55,59,63,67,71]]
In [892]:
job_seekers.head()
Out[892]:
Area 2002-11-01 00:00:00 2003-11-01 00:00:00 2004-11-01 00:00:00 2005-11-01 00:00:00 2006-11-01 00:00:00 2007-11-01 00:00:00 2008-11-01 00:00:00 2009-11-01 00:00:00 2010-11-01 00:00:00 2011-11-01 00:00:00 2012-11-01 00:00:00 2013-11-01 00:00:00 2014-11-01 00:00:00 2015-11-01 00:00:00 2016-11-01 00:00:00
2 Barking and Dagenham 2.952248 3.224947 3.165277 3.702238 3.624745 3.236276 4.017825 5.428969 5.383264 6.142692 5.459403 4.213242 2.828289 2.126141 1.759887
3 Barnet 2.509677 2.381042 2.136076 2.271601 2.174120 1.646113 2.030966 2.946373 2.671089 2.893046 2.718031 2.139966 1.537465 1.184450 0.990578
4 Bexley 1.811202 1.815898 1.770328 1.916928 1.939469 1.446753 2.174094 3.190996 2.790570 3.135072 2.856872 2.189345 1.466353 1.168869 0.889258
5 Brent 3.780913 3.731985 3.661067 3.809729 3.936023 3.274908 3.532214 5.025038 4.970964 4.318946 4.369714 3.561057 2.647838 2.094117 1.617005
6 Bromley 1.868097 1.858206 1.756767 1.852054 1.687616 1.405241 1.996908 2.838646 2.542845 2.993951 2.798361 2.037068 1.311893 1.012404 0.853916
In [893]:
# Slightly changing the column names for readability 
job_seekers.columns = ['Borough','2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016']
In [894]:
job_seekers.head()
Out[894]:
Borough 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
2 Barking and Dagenham 2.952248 3.224947 3.165277 3.702238 3.624745 3.236276 4.017825 5.428969 5.383264 6.142692 5.459403 4.213242 2.828289 2.126141 1.759887
3 Barnet 2.509677 2.381042 2.136076 2.271601 2.174120 1.646113 2.030966 2.946373 2.671089 2.893046 2.718031 2.139966 1.537465 1.184450 0.990578
4 Bexley 1.811202 1.815898 1.770328 1.916928 1.939469 1.446753 2.174094 3.190996 2.790570 3.135072 2.856872 2.189345 1.466353 1.168869 0.889258
5 Brent 3.780913 3.731985 3.661067 3.809729 3.936023 3.274908 3.532214 5.025038 4.970964 4.318946 4.369714 3.561057 2.647838 2.094117 1.617005
6 Bromley 1.868097 1.858206 1.756767 1.852054 1.687616 1.405241 1.996908 2.838646 2.542845 2.993951 2.798361 2.037068 1.311893 1.012404 0.853916
In [895]:
job_seekers.reset_index(drop =True)
Out[895]:
Borough 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
0 Barking and Dagenham 2.952248 3.224947 3.165277 3.702238 3.624745 3.236276 4.017825 5.428969 5.383264 6.142692 5.459403 4.213242 2.828289 2.126141 1.759887
1 Barnet 2.509677 2.381042 2.136076 2.271601 2.174120 1.646113 2.030966 2.946373 2.671089 2.893046 2.718031 2.139966 1.537465 1.184450 0.990578
2 Bexley 1.811202 1.815898 1.770328 1.916928 1.939469 1.446753 2.174094 3.190996 2.790570 3.135072 2.856872 2.189345 1.466353 1.168869 0.889258
3 Brent 3.780913 3.731985 3.661067 3.809729 3.936023 3.274908 3.532214 5.025038 4.970964 4.318946 4.369714 3.561057 2.647838 2.094117 1.617005
4 Bromley 1.868097 1.858206 1.756767 1.852054 1.687616 1.405241 1.996908 2.838646 2.542845 2.993951 2.798361 2.037068 1.311893 1.012404 0.853916
5 Camden 3.661282 3.521265 3.178000 3.193087 2.904914 2.248004 2.464611 3.273089 2.841291 3.414586 3.104903 2.352870 1.666169 1.449623 1.155212
6 Croydon 2.672229 2.450758 2.224936 2.547461 2.571245 2.114916 2.645969 4.012160 4.012348 4.425973 4.096887 2.979819 2.005514 1.799862 0.699267
7 Ealing 2.704908 2.641139 2.461524 2.700555 2.683558 2.213453 2.598402 4.095801 3.699165 3.752674 3.524498 3.050565 2.230260 1.854996 1.554821
8 Enfield 2.875181 3.026592 3.048945 3.371310 3.278479 2.671402 3.250408 4.702484 4.637957 5.119604 4.664601 3.723892 2.558241 1.762785 1.573756
9 Greenwich 3.537161 3.539570 3.448438 3.583992 3.468155 2.738892 3.483653 4.547688 4.604758 4.559690 4.067689 3.243485 2.315350 1.977718 1.483319
10 Hackney 5.132922 5.071335 4.682593 4.938649 5.187173 4.368813 4.557818 6.199743 6.340935 6.008882 5.124737 4.194545 2.924038 2.345373 1.879604
11 Hammersmith and Fulham 3.395747 3.297569 2.952867 2.935972 2.938470 2.505337 2.961148 4.119914 3.947400 3.639699 3.392687 3.050272 2.209876 1.681567 1.320470
12 Haringey 4.603853 4.493058 4.462175 4.825596 4.658684 3.844618 4.327817 5.880588 5.935578 5.686335 5.050918 4.125280 2.990864 2.295694 1.780361
13 Harrow 2.004209 2.065733 1.884259 2.000781 1.885877 1.433219 1.840255 2.835923 2.492936 2.524647 2.406057 1.837852 1.349553 1.053946 0.869832
14 Havering 1.561857 1.544642 1.415573 1.662290 1.681634 1.425823 2.128608 3.303033 3.118212 3.493791 3.376742 2.476408 1.696906 1.469962 1.197887
15 Hillingdon 1.852870 2.071840 1.904519 2.102199 1.930525 1.574868 1.958870 3.425265 2.746973 2.800385 2.596537 2.073182 1.379002 1.203437 1.015285
16 Hounslow 1.908193 1.926343 1.886419 2.156077 2.040881 1.650237 2.013900 3.213676 2.931098 2.933833 2.625218 2.221391 1.696231 1.310363 0.406303
17 Islington 4.488591 4.406882 4.130966 4.127440 3.983171 3.352832 3.754596 5.240989 4.559540 4.549491 4.097250 3.299005 2.359328 1.866727 1.544591
18 Kensington and Chelsea 2.428878 2.291296 1.961554 2.117027 2.009934 1.684415 1.975403 2.927149 2.877257 2.804449 2.595132 2.118567 1.633612 1.391368 1.069797
19 Kingston upon Thames 1.585016 1.485502 1.363345 1.250929 1.097695 0.900539 1.215612 1.961503 1.581155 1.826050 1.770468 1.382427 1.058979 0.892007 0.753790
20 Lambeth 4.892338 4.676460 4.259090 4.302013 4.020871 3.405068 3.698048 5.237063 4.928736 5.233822 4.907332 3.982104 2.813396 2.342995 1.679292
21 Lewisham 4.098478 3.966955 3.738392 3.946675 3.655428 2.946128 3.351131 4.710934 4.807052 5.335346 4.950092 3.842339 2.759655 2.304513 1.690705
22 Merton 2.104156 2.107041 1.949553 2.164582 1.974706 1.445314 1.766697 2.569185 2.276764 2.819651 2.654266 2.164032 1.636316 1.381456 1.014532
23 Newham 4.188469 3.861027 3.742186 4.327205 4.592986 3.949074 4.453551 6.001933 6.188660 5.159122 4.734236 3.695789 2.542984 1.809943 1.411763
24 Redbridge 2.302662 2.137252 2.091897 2.431295 2.477483 1.998450 2.649924 3.596797 3.468344 3.774670 3.385780 2.519415 1.709196 1.346948 0.992650
25 Richmond upon Thames 1.586414 1.504476 1.282289 1.277788 1.161978 0.838630 1.158132 1.864362 1.555512 1.561012 1.454738 1.314629 1.021361 0.876340 0.765111
26 Southwark 4.769097 4.776921 4.375731 4.038886 3.747692 3.065115 3.294703 4.480129 4.521741 5.161393 4.795608 3.775046 2.724930 2.114081 1.166130
27 Sutton 1.613217 1.554491 1.540229 1.711413 1.570089 1.272921 1.725797 2.730825 2.474939 2.774965 2.566874 1.956114 1.329475 0.924983 0.347571
28 Tower Hamlets 5.252472 5.263014 4.643250 4.941346 5.285745 4.606519 4.767128 5.667477 5.545117 5.777792 5.250545 3.916112 2.617814 1.837056 1.567036
29 Waltham Forest 3.752919 3.779719 3.821313 4.033536 4.215268 3.574572 4.096201 5.464373 5.318004 5.409249 4.986628 4.033194 2.820658 1.796960 1.560955
30 Wandsworth 2.674663 2.500453 2.377982 2.367301 2.159060 1.681282 1.986930 3.031680 2.741116 2.816238 2.578450 2.187437 1.554706 1.278728 0.960439
31 Westminster 2.667223 2.411146 2.084518 2.103008 2.057613 1.674693 1.813667 2.662092 2.523473 3.156811 2.761202 2.202777 1.587527 1.403493 1.081711
In [896]:
# Checking for missing values
job_seekers.isnull().sum
Out[896]:
<bound method DataFrame.sum of     Borough   2002   2003   2004   2005   2006   2007   2008   2009   2010  \
2     False  False  False  False  False  False  False  False  False  False   
3     False  False  False  False  False  False  False  False  False  False   
4     False  False  False  False  False  False  False  False  False  False   
5     False  False  False  False  False  False  False  False  False  False   
6     False  False  False  False  False  False  False  False  False  False   
7     False  False  False  False  False  False  False  False  False  False   
8     False  False  False  False  False  False  False  False  False  False   
9     False  False  False  False  False  False  False  False  False  False   
10    False  False  False  False  False  False  False  False  False  False   
11    False  False  False  False  False  False  False  False  False  False   
12    False  False  False  False  False  False  False  False  False  False   
13    False  False  False  False  False  False  False  False  False  False   
14    False  False  False  False  False  False  False  False  False  False   
15    False  False  False  False  False  False  False  False  False  False   
16    False  False  False  False  False  False  False  False  False  False   
17    False  False  False  False  False  False  False  False  False  False   
18    False  False  False  False  False  False  False  False  False  False   
19    False  False  False  False  False  False  False  False  False  False   
20    False  False  False  False  False  False  False  False  False  False   
21    False  False  False  False  False  False  False  False  False  False   
22    False  False  False  False  False  False  False  False  False  False   
23    False  False  False  False  False  False  False  False  False  False   
24    False  False  False  False  False  False  False  False  False  False   
25    False  False  False  False  False  False  False  False  False  False   
26    False  False  False  False  False  False  False  False  False  False   
27    False  False  False  False  False  False  False  False  False  False   
28    False  False  False  False  False  False  False  False  False  False   
29    False  False  False  False  False  False  False  False  False  False   
30    False  False  False  False  False  False  False  False  False  False   
31    False  False  False  False  False  False  False  False  False  False   
32    False  False  False  False  False  False  False  False  False  False   
33    False  False  False  False  False  False  False  False  False  False   

     2011   2012   2013   2014   2015   2016  
2   False  False  False  False  False  False  
3   False  False  False  False  False  False  
4   False  False  False  False  False  False  
5   False  False  False  False  False  False  
6   False  False  False  False  False  False  
7   False  False  False  False  False  False  
8   False  False  False  False  False  False  
9   False  False  False  False  False  False  
10  False  False  False  False  False  False  
11  False  False  False  False  False  False  
12  False  False  False  False  False  False  
13  False  False  False  False  False  False  
14  False  False  False  False  False  False  
15  False  False  False  False  False  False  
16  False  False  False  False  False  False  
17  False  False  False  False  False  False  
18  False  False  False  False  False  False  
19  False  False  False  False  False  False  
20  False  False  False  False  False  False  
21  False  False  False  False  False  False  
22  False  False  False  False  False  False  
23  False  False  False  False  False  False  
24  False  False  False  False  False  False  
25  False  False  False  False  False  False  
26  False  False  False  False  False  False  
27  False  False  False  False  False  False  
28  False  False  False  False  False  False  
29  False  False  False  False  False  False  
30  False  False  False  False  False  False  
31  False  False  False  False  False  False  
32  False  False  False  False  False  False  
33  False  False  False  False  False  False  >
In [897]:
#Melting data frame for the future concating with other data frames
job_seekers_new = pd.melt(job_seekers, id_vars = ['Borough'], var_name = 'Year', value_name = 'Job Seekers Allowance Claimants (Rate)')
In [898]:
job_seekers_new.head()
Out[898]:
Borough Year Job Seekers Allowance Claimants (Rate)
0 Barking and Dagenham 2002 2.952248
1 Barnet 2002 2.509677
2 Bexley 2002 1.811202
3 Brent 2002 3.780913
4 Bromley 2002 1.868097
In [899]:
job_seekers_new.shape
Out[899]:
(480, 3)

National Insurance Number Registrations (as indicator of Immigration)

In [900]:
# Importing National Insurance Number Registrations (Immigration) file (Rates per 1,000 working age population) 
immigration_df = pd.read_excel('national-insurance-number-registrations.xls', sheet_name = 'Borough Summary', header = [1])
In [901]:
immigration_df.head()
Out[901]:
Old Code New Code Area 2002/03 2003/04 2004/05 2005/06 2006/07 2007/08 2008/09 ... 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 00AA E09000001 City of London 332.0 374.0 382.0 528.0 716.0 800.0 724.0 ... 5859.0 5691.0 5748.0 5798.0 5682.0 5915.0 6405.0 6872.0 5020.0 5811.0
2 00AB E09000002 Barking and Dagenham 2115.0 2345.0 2517.0 3161.0 3063.0 4287.0 4884.0 ... 112268.0 116171.0 119166.0 120526.0 122471.0 125164.0 127461.0 130122.0 133380.0 134378.0
3 00AC E09000003 Barnet 5940.0 6269.0 6813.0 8863.0 9234.0 11522.0 10932.0 ... 228299.0 231634.0 235392.0 238040.0 240658.0 243908.0 246528.0 250359.0 249882.0 251855.0
4 00AD E09000004 Bexley 960.0 1029.0 983.0 1400.0 1540.0 1645.0 1753.0 ... 144957.0 146566.0 147684.0 147714.0 148903.0 150714.0 152284.0 154061.0 154917.0 155514.0

5 rows × 53 columns

In [902]:
# Doing necessary transformations to make it consistent with other data frames     
immigration_df.set_index('Area',inplace = True  )
immigration_df.index.name = 'Borough'
In [903]:
immigration_df.head()
Out[903]:
Old Code New Code 2002/03 2003/04 2004/05 2005/06 2006/07 2007/08 2008/09 2009/10 ... 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Borough
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
City of London 00AA E09000001 332.0 374.0 382.0 528.0 716.0 800.0 724.0 580.0 ... 5859.0 5691.0 5748.0 5798.0 5682.0 5915.0 6405.0 6872.0 5020.0 5811.0
Barking and Dagenham 00AB E09000002 2115.0 2345.0 2517.0 3161.0 3063.0 4287.0 4884.0 5148.0 ... 112268.0 116171.0 119166.0 120526.0 122471.0 125164.0 127461.0 130122.0 133380.0 134378.0
Barnet 00AC E09000003 5940.0 6269.0 6813.0 8863.0 9234.0 11522.0 10932.0 8569.0 ... 228299.0 231634.0 235392.0 238040.0 240658.0 243908.0 246528.0 250359.0 249882.0 251855.0
Bexley 00AD E09000004 960.0 1029.0 983.0 1400.0 1540.0 1645.0 1753.0 1733.0 ... 144957.0 146566.0 147684.0 147714.0 148903.0 150714.0 152284.0 154061.0 154917.0 155514.0

5 rows × 52 columns

In [904]:
# Doing necessary transformations to make it consistent with other data frames     
immigration = immigration_df.iloc[2:34,19:34]
In [905]:
# Resetting index
immigration.reset_index( inplace = True)
In [906]:
# Slightly changing the column names for readability 
immigration.columns = ['Borough','2002', ' 2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', ' 2012', '2013', '2014', '2015', '2016']
In [907]:
immigration.head()
Out[907]:
Borough 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
0 Barking and Dagenham 20.253383 22.435468 24.076217 30.144093 29.041709 40.224061 44.852192 45.854562 52.405506 39.407213 29.147238 35.967698 61.735004 59.139658 58.291450
1 Barnet 28.164072 29.635337 31.995529 41.042478 42.280026 52.051428 48.769389 37.534111 45.308547 37.562874 34.876491 39.084510 59.087853 53.113642 46.672978
2 Bexley 6.935214 7.405701 7.048262 9.955343 10.874170 11.518317 12.169724 11.955269 13.891353 10.373500 8.265973 9.408810 13.986756 14.433558 14.059366
3 Brent 51.338043 53.723179 68.535571 80.849135 81.886747 98.971148 93.511241 95.109065 88.631436 70.045453 66.870002 76.716659 114.922554 100.890910 87.941571
4 Bromley 8.303437 7.234291 7.882799 11.176629 11.395907 10.910071 11.101750 9.078037 11.770573 9.199688 8.521364 9.519662 13.804687 14.370245 13.328949
In [908]:
#Melting data frame for the future concating with other data frames
immigration_new = pd.melt(immigration, id_vars = ['Borough'], var_name = 'Year', value_name = 'National Insurance Number Registrations (Immigration)(Rate)')
In [909]:
immigration_new.head()
Out[909]:
Borough Year National Insurance Number Registrations (Immigration)(Rate)
0 Barking and Dagenham 2002 20.253383
1 Barnet 2002 28.164072
2 Bexley 2002 6.935214
3 Brent 2002 51.338043
4 Bromley 2002 8.303437
In [910]:
# Checking for missing values
immigration_new.isnull().sum
Out[910]:
<bound method DataFrame.sum of      Borough   Year  \
0      False  False   
1      False  False   
2      False  False   
3      False  False   
4      False  False   
5      False  False   
6      False  False   
7      False  False   
8      False  False   
9      False  False   
10     False  False   
11     False  False   
12     False  False   
13     False  False   
14     False  False   
15     False  False   
16     False  False   
17     False  False   
18     False  False   
19     False  False   
20     False  False   
21     False  False   
22     False  False   
23     False  False   
24     False  False   
25     False  False   
26     False  False   
27     False  False   
28     False  False   
29     False  False   
..       ...    ...   
450    False  False   
451    False  False   
452    False  False   
453    False  False   
454    False  False   
455    False  False   
456    False  False   
457    False  False   
458    False  False   
459    False  False   
460    False  False   
461    False  False   
462    False  False   
463    False  False   
464    False  False   
465    False  False   
466    False  False   
467    False  False   
468    False  False   
469    False  False   
470    False  False   
471    False  False   
472    False  False   
473    False  False   
474    False  False   
475    False  False   
476    False  False   
477    False  False   
478    False  False   
479    False  False   

     National Insurance Number Registrations (Immigration)(Rate)  
0                                                False            
1                                                False            
2                                                False            
3                                                False            
4                                                False            
5                                                False            
6                                                False            
7                                                False            
8                                                False            
9                                                False            
10                                               False            
11                                               False            
12                                               False            
13                                               False            
14                                               False            
15                                               False            
16                                               False            
17                                               False            
18                                               False            
19                                               False            
20                                               False            
21                                               False            
22                                               False            
23                                               False            
24                                               False            
25                                               False            
26                                               False            
27                                               False            
28                                               False            
29                                               False            
..                                                 ...            
450                                              False            
451                                              False            
452                                              False            
453                                              False            
454                                              False            
455                                              False            
456                                              False            
457                                              False            
458                                              False            
459                                              False            
460                                              False            
461                                              False            
462                                              False            
463                                              False            
464                                              False            
465                                              False            
466                                              False            
467                                              False            
468                                              False            
469                                              False            
470                                              False            
471                                              False            
472                                              False            
473                                              False            
474                                              False            
475                                              False            
476                                              False            
477                                              False            
478                                              False            
479                                              False            

[480 rows x 3 columns]>
In [911]:
immigration_new.shape
Out[911]:
(480, 3)

Income Support Claimants (as indicator of poverty)

In [912]:
# Importing Income Support Claimants file (Rates are as a percentage of all people aged 16-64 from ONS mid-year estimates) 
income_support_df = pd.read_excel('income-support-borough.xls', sheet_name = 'Rates')
In [913]:
income_support_df.head()
Out[913]:
Code Area 1999-08-01 00:00:00 1999-11-01 00:00:00 2000-02-01 00:00:00 2000-05-01 00:00:00 2000-08-01 00:00:00 2000-11-01 00:00:00 2001-02-01 00:00:00 2001-05-01 00:00:00 ... 2016-08-01 00:00:00 2016-11-01 00:00:00 2017-02-01 00:00:00 2017-05-01 00:00:00 2017-08-01 00:00:00 2017-11-01 00:00:00 2018-02-01 00:00:00 2018-05-01 00:00:00 2018-08-01 00:00:00 2018-11-01 00:00:00
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 E09000001 City of London 5.646300 5.646300 5.013928 5.013928 5.199629 5.199629 5.059316 5.233775 ... 0.436554 0.436554 0.597610 0.597610 0.597610 0.597610 0.398406 0.398406 0.398406 0.199203
2 E09000002 Barking and Dagenham 15.851676 15.941971 15.579925 15.500889 15.491010 15.629322 15.431411 15.441098 ... 2.758949 2.758949 2.526616 2.436647 2.399160 2.354176 2.339181 2.196731 2.046784 1.889339
3 E09000003 Barnet 9.425492 9.479464 9.289607 9.260516 9.376879 9.439908 9.191640 9.215428 ... 1.094428 1.094428 1.012478 1.000472 0.996470 0.968457 0.956451 0.932440 0.860406 0.784370
4 E09000004 Bexley 7.940382 7.889388 7.808426 7.866373 7.946051 8.025729 7.977319 8.027946 ... 1.661679 1.661679 1.568582 1.549217 1.523396 1.491121 1.484666 1.458846 1.458846 1.387840

5 rows × 80 columns

In [914]:
# Doing necessary transformations to make it consistent with other data frames     
income_support_df.drop('Code', axis=1, inplace=True)
income_support_df.set_index('Area',inplace = True  )
income_support_df.index.name = 'Borough'
In [915]:
income_support_df.head()
Out[915]:
1999-08-01 00:00:00 1999-11-01 00:00:00 2000-02-01 00:00:00 2000-05-01 00:00:00 2000-08-01 00:00:00 2000-11-01 00:00:00 2001-02-01 00:00:00 2001-05-01 00:00:00 2001-08-01 00:00:00 2001-11-01 00:00:00 ... 2016-08-01 00:00:00 2016-11-01 00:00:00 2017-02-01 00:00:00 2017-05-01 00:00:00 2017-08-01 00:00:00 2017-11-01 00:00:00 2018-02-01 00:00:00 2018-05-01 00:00:00 2018-08-01 00:00:00 2018-11-01 00:00:00
Borough
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
City of London 5.646300 5.646300 5.013928 5.013928 5.199629 5.199629 5.059316 5.233775 5.233775 5.059316 ... 0.436554 0.436554 0.597610 0.597610 0.597610 0.597610 0.398406 0.398406 0.398406 0.199203
Barking and Dagenham 15.851676 15.941971 15.579925 15.500889 15.491010 15.629322 15.431411 15.441098 15.537968 15.528281 ... 2.758949 2.758949 2.526616 2.436647 2.399160 2.354176 2.339181 2.196731 2.046784 1.889339
Barnet 9.425492 9.479464 9.289607 9.260516 9.376879 9.439908 9.191640 9.215428 9.182125 9.139307 ... 1.094428 1.094428 1.012478 1.000472 0.996470 0.968457 0.956451 0.932440 0.860406 0.784370
Bexley 7.940382 7.889388 7.808426 7.866373 7.946051 8.025729 7.977319 8.027946 8.049643 7.962854 ... 1.661679 1.661679 1.568582 1.549217 1.523396 1.491121 1.484666 1.458846 1.458846 1.387840

5 rows × 78 columns

In [916]:
# Removing the rows and columns that are not needed for our analysis. Since 4 sets figures per year are given to make it consistent with other data frames we only keep figures for November of each year
income_support = income_support_df.iloc[2:34,[13,17,21,25,29,33,37,41,45,49,53,57,61,65,69]]
In [917]:
income_support.columns = ['2002', ' 2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', ' 2012', '2013', '2014', '2015', '2016']
In [918]:
# Resetting index
income_support.reset_index(inplace = True)
In [919]:
income_support.head()
Out[919]:
Borough 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
0 Barking and Dagenham 15.503653 10.485831 10.646337 10.632921 10.609753 10.480587 10.138579 8.844907 8.057088 7.074165 5.509185 4.319390 3.275702 2.871467 2.758949
1 Barnet 9.222074 5.497830 5.696547 5.561550 5.526531 5.412047 5.241862 4.621133 4.191958 3.619494 2.558394 1.865718 1.414468 1.147943 1.094428
2 Bexley 7.831012 4.692437 4.603239 4.636346 4.730970 4.670345 4.574927 4.263333 4.011844 3.588744 2.911031 2.303513 1.944080 1.740170 1.661679
3 Brent 12.663241 8.226701 8.245167 8.139428 7.964364 7.432596 6.997779 5.977701 5.351475 4.586859 3.336117 2.380165 1.852114 1.538721 1.454401
4 Bromley 7.616807 4.472300 4.499145 4.484352 4.541658 4.452138 4.376997 4.032979 3.775947 3.373894 2.530648 1.956986 1.634896 1.444888 1.383934
In [920]:
#Melting data frame for the future concating with other data frames
income_support_new = pd.melt(income_support, id_vars = ['Borough'], var_name = 'Year', value_name = 'Income Support Claimants (Rate)')
In [921]:
income_support_new.head()
Out[921]:
Borough Year Income Support Claimants (Rate)
0 Barking and Dagenham 2002 15.503653
1 Barnet 2002 9.222074
2 Bexley 2002 7.831012
3 Brent 2002 12.663241
4 Bromley 2002 7.616807
In [922]:
income_support_new.shape
Out[922]:
(480, 3)

Affordable Housing (as a possible socioeconomic indicator)

In [923]:
# Importing DCLG Affordable Housing Supply file (Total affordable housing completions by financial year) 
affordable_housing_df = pd.read_excel('dclg-affordable-housing-borough.xlsx', sheet_name = 'data')
In [924]:
affordable_housing_df.head()
Out[924]:
Former ONS code Current ONS code Area name 1991-92 1992-93 1993-94 1994-95 1995-96 1996-97 1997-98 ... 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15 2015-16 2016-17 2017-18 2018-19
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 00AA E09000001 City of London 0.0 0.0 0.0 0.0 43.0 3.0 69.0 ... 46.0 0.0 3.0 6.0 20.0 44.0 1.0 0.0 19.0 0.0
2 00AB E09000002 Barking and Dagenham 53.0 90.0 128.0 197.0 115.0 404.0 194.0 ... 189.0 284.0 384.0 114.0 530.0 1200.0 318.0 59.0 320.0 275.0
3 00AC E09000003 Barnet 73.0 405.0 471.0 428.0 432.0 228.0 322.0 ... 214.0 392.0 666.0 357.0 393.0 441.0 128.0 320.0 237.0 150.0
4 00AD E09000004 Bexley 129.0 159.0 372.0 135.0 122.0 172.0 195.0 ... 284.0 317.0 357.0 172.0 96.0 387.0 197.0 69.0 136.0 163.0

5 rows × 31 columns

In [925]:
# Doing necessary transformations to make it consistent with other data frames     
affordable_housing = affordable_housing_df.iloc[2:34,[2,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28]]
In [926]:
affordable_housing.columns = ['Borough','2002', ' 2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', ' 2012', '2013', ' 2014', '2015', '2016']
In [927]:
affordable_housing.head()
Out[927]:
Borough 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
2 Barking and Dagenham 204.0 317.0 288.0 331.0 303.0 420.0 238.0 189.0 284.0 384.0 114.0 530.0 1200.0 318.0 59.0
3 Barnet 231.0 243.0 254.0 407.0 71.0 347.0 316.0 214.0 392.0 666.0 357.0 393.0 441.0 128.0 320.0
4 Bexley 262.0 171.0 154.0 168.0 158.0 391.0 152.0 284.0 317.0 357.0 172.0 96.0 387.0 197.0 69.0
5 Brent 328.0 427.0 314.0 259.0 522.0 295.0 648.0 557.0 703.0 852.0 530.0 277.0 1316.0 208.0 226.0
6 Bromley 267.0 276.0 195.0 372.0 280.0 350.0 340.0 323.0 505.0 394.0 358.0 59.0 209.0 86.0 74.0
In [928]:
# Resetting index
affordable_housing.reset_index(drop = True, inplace= True)
In [929]:
affordable_housing.head()
Out[929]:
Borough 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
0 Barking and Dagenham 204.0 317.0 288.0 331.0 303.0 420.0 238.0 189.0 284.0 384.0 114.0 530.0 1200.0 318.0 59.0
1 Barnet 231.0 243.0 254.0 407.0 71.0 347.0 316.0 214.0 392.0 666.0 357.0 393.0 441.0 128.0 320.0
2 Bexley 262.0 171.0 154.0 168.0 158.0 391.0 152.0 284.0 317.0 357.0 172.0 96.0 387.0 197.0 69.0
3 Brent 328.0 427.0 314.0 259.0 522.0 295.0 648.0 557.0 703.0 852.0 530.0 277.0 1316.0 208.0 226.0
4 Bromley 267.0 276.0 195.0 372.0 280.0 350.0 340.0 323.0 505.0 394.0 358.0 59.0 209.0 86.0 74.0
In [930]:
# Melting data frame for the future concating with other data frames
affordable_housing_new = pd.melt(affordable_housing, id_vars = ['Borough'], var_name = 'Year', value_name = 'Affordable Housing Supply')
In [931]:
affordable_housing_new.head()
Out[931]:
Borough Year Affordable Housing Supply
0 Barking and Dagenham 2002 204.0
1 Barnet 2002 231.0
2 Bexley 2002 262.0
3 Brent 2002 328.0
4 Bromley 2002 267.0
In [932]:
# Checking for missing values
affordable_housing_new.isnull().sum
Out[932]:
<bound method DataFrame.sum of      Borough   Year  Affordable Housing Supply
0      False  False                      False
1      False  False                      False
2      False  False                      False
3      False  False                      False
4      False  False                      False
5      False  False                      False
6      False  False                      False
7      False  False                      False
8      False  False                      False
9      False  False                      False
10     False  False                      False
11     False  False                      False
12     False  False                      False
13     False  False                      False
14     False  False                      False
15     False  False                      False
16     False  False                      False
17     False  False                      False
18     False  False                      False
19     False  False                      False
20     False  False                      False
21     False  False                      False
22     False  False                      False
23     False  False                      False
24     False  False                      False
25     False  False                      False
26     False  False                      False
27     False  False                      False
28     False  False                      False
29     False  False                      False
..       ...    ...                        ...
450    False  False                      False
451    False  False                      False
452    False  False                      False
453    False  False                      False
454    False  False                      False
455    False  False                      False
456    False  False                      False
457    False  False                      False
458    False  False                      False
459    False  False                      False
460    False  False                      False
461    False  False                      False
462    False  False                      False
463    False  False                      False
464    False  False                      False
465    False  False                      False
466    False  False                      False
467    False  False                      False
468    False  False                      False
469    False  False                      False
470    False  False                      False
471    False  False                      False
472    False  False                      False
473    False  False                      False
474    False  False                      False
475    False  False                      False
476    False  False                      False
477    False  False                      False
478    False  False                      False
479    False  False                      False

[480 rows x 3 columns]>
In [933]:
affordable_housing.shape
Out[933]:
(32, 16)

Ratio of House Prices to Earnings (as indicator of income inequality)

In [934]:
# Importing Ratio of House Prices to Earnings file (Median  House Prices to Residence-Based Earnings) 
House_Prices_to_Earnings_df = pd.read_excel('ratio-house-price-earnings-residence-based.xls', sheet_name = 'Median Earnings to Prices ratio')
In [935]:
House_Prices_to_Earnings_df.head()
Out[935]:
New Code Old Code Area 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 E09000001 00AA City of London 6.29 : : : : : : 9.52 : 8.66 : : : : 14.83 : 16.54
2 E09000002 00AB Barking and Dagenham 4.68 6.22 7.18 7.15 6.55 7.05 7.46 5.71 6.3 6.56 6.38 6.28 7.54 8.24 9.89 10 10.42
3 E09000003 00AC Barnet 7.44 8.4 8.64 8.46 9.11 9.83 9.16 8.47 9.7 10.08 10.05 10.16 12.14 13.3 14.05 13.7 14.58
4 E09000004 00AD Bexley 5.56 6.29 6.57 6.73 6.79 7.19 6.97 6.29 6.72 6.77 6.53 6.68 7.69 8.47 9.44 10.01 10.02
In [936]:
# Doing necessary transformations to make it consistent with other data frames     
House_Prices_to_Earnings = House_Prices_to_Earnings_df.iloc[2:34,2:18]
In [937]:
# Resetting index
House_Prices_to_Earnings.reset_index(drop = True, inplace = True)
In [938]:
# Doing necessary transformations to make it consistent with other data frames     
House_Prices_to_Earnings.columns = ['Borough','2002', ' 2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016']
In [939]:
#Melting data frame for the future concating with other data frames
House_Prices_to_Earnings_new = pd.melt(House_Prices_to_Earnings, id_vars = ['Borough'], var_name = 'Year', value_name = 'Ratio of House Prices to Earnings')
In [940]:
House_Prices_to_Earnings_new.head()
Out[940]:
Borough Year Ratio of House Prices to Earnings
0 Barking and Dagenham 2002 4.68
1 Barnet 2002 7.44
2 Bexley 2002 5.56
3 Brent 2002 8.1
4 Bromley 2002 6.48
In [941]:
# Checking for missing values
House_Prices_to_Earnings_new.isnull().sum
Out[941]:
<bound method DataFrame.sum of      Borough   Year  Ratio of House Prices to Earnings
0      False  False                              False
1      False  False                              False
2      False  False                              False
3      False  False                              False
4      False  False                              False
5      False  False                              False
6      False  False                              False
7      False  False                              False
8      False  False                              False
9      False  False                              False
10     False  False                              False
11     False  False                              False
12     False  False                              False
13     False  False                              False
14     False  False                              False
15     False  False                              False
16     False  False                              False
17     False  False                              False
18     False  False                              False
19     False  False                              False
20     False  False                              False
21     False  False                              False
22     False  False                              False
23     False  False                              False
24     False  False                              False
25     False  False                              False
26     False  False                              False
27     False  False                              False
28     False  False                              False
29     False  False                              False
..       ...    ...                                ...
450    False  False                              False
451    False  False                              False
452    False  False                              False
453    False  False                              False
454    False  False                              False
455    False  False                              False
456    False  False                              False
457    False  False                              False
458    False  False                              False
459    False  False                              False
460    False  False                              False
461    False  False                              False
462    False  False                              False
463    False  False                              False
464    False  False                              False
465    False  False                              False
466    False  False                              False
467    False  False                              False
468    False  False                              False
469    False  False                              False
470    False  False                              False
471    False  False                              False
472    False  False                              False
473    False  False                              False
474    False  False                              False
475    False  False                              False
476    False  False                              False
477    False  False                              False
478    False  False                              False
479    False  False                              False

[480 rows x 3 columns]>
In [942]:
House_Prices_to_Earnings.shape
Out[942]:
(32, 16)

Personal Insolvency and Bankruptcy Statistics (as a possible socioeconomic indicator)

In [943]:
# Importing Personal Insolvency and bankruptcy Statistics file for each year (Rates per 10,000 adult population) 
Insolvency2002 = pd.read_excel('personal-insolvency.xls', sheet_name = '2002')
Insolvency2003 = pd.read_excel('personal-insolvency.xls', sheet_name = '2003')
Insolvency2004 = pd.read_excel('personal-insolvency.xls', sheet_name = '2004')
Insolvency2005 = pd.read_excel('personal-insolvency.xls', sheet_name = '2005')
Insolvency2006 = pd.read_excel('personal-insolvency.xls', sheet_name = '2006')
Insolvency2007 = pd.read_excel('personal-insolvency.xls', sheet_name = '2007')
Insolvency2008 = pd.read_excel('personal-insolvency.xls', sheet_name = '2008')
Insolvency2009 = pd.read_excel('personal-insolvency.xls', sheet_name = '2009')
Insolvency2010 = pd.read_excel('personal-insolvency.xls', sheet_name = '2010')
Insolvency2011 = pd.read_excel('personal-insolvency.xls', sheet_name = '2011')
Insolvency2012 = pd.read_excel('personal-insolvency.xls', sheet_name = '2012')
Insolvency2013 = pd.read_excel('personal-insolvency.xls', sheet_name = '2013')
Insolvency2014 = pd.read_excel('personal-insolvency.xls', sheet_name = '2014')
Insolvency2015 = pd.read_excel('personal-insolvency.xls', sheet_name = '2015')
Insolvency2016 = pd.read_excel('personal-insolvency.xls', sheet_name = '2016')
insolvency_df = pd.concat([Insolvency2002, Insolvency2003, Insolvency2004, Insolvency2005, Insolvency2006, Insolvency2007, Insolvency2008,Insolvency2009, Insolvency2010, Insolvency2011, Insolvency2012, Insolvency2013, Insolvency2014, Insolvency2015, Insolvency2016],  axis=1)
In [944]:
insolvency_df.head()
Out[944]:
Code Area New Personal Insolvencies (counts) New Personal Insolvencies (rates) New Bankruptcy Orders (counts) New Bankruptcy Orders (rates) New Individual Voluntary Arrangements (IVAs) (counts) New Individual Voluntary Arrangements (IVAs) (rates) Code Area ... Unnamed: 0 Unnamed: 1 New Personal Insolvencies New Personal Insolvencies.1 New Bankruptcy Orders New Bankruptcy Orders.1 New Individual Voluntary Arrangements (IVAs) New Individual Voluntary Arrangements (IVAs).1 New Debt Relief Orders (DROs) New Debt Relief Orders (DROs).1
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... LA_CODE LA_NAME Count Rate per 10000 Count Rate per 10000 Count Rate per 10000 Count Rate per 10000
1 00AA City of London 2.0 2.9 2.0 2.9 0.0 0.0 00AA City of London ... E09000001 City of London 5 6.07165 2 2.42866 3 3.64299 0 0
2 00AB Barking and Dagenham 50.0 4.0 40.0 3.2 10.0 0.8 00AB Barking and Dagenham ... E09000002 Barking and Dagenham 330 22.8094 70 4.83836 203 14.0313 57 3.93981
3 00AC Barnet 113.0 4.5 84.0 3.4 29.0 1.2 00AC Barnet ... E09000003 Barnet 290 9.80568 73 2.46833 155 5.24097 62 2.09639
4 00AD Bexley 82.0 4.9 67.0 4.0 15.0 0.9 00AD Bexley ... E09000004 Bexley 284 15.0842 47 2.49632 193 10.2509 44 2.33698

5 rows × 136 columns

In [945]:
# Doing necessary transformations to make it consistent with other data frames     
insolvency = insolvency_df.iloc[2:34,[1,3,11,19,27,35,43,51,59,69,79,89,99,109,119,129]]
In [946]:
insolvency.head()
Out[946]:
Area New Personal Insolvencies (rates) New Personal Insolvencies (rates) New Personal Insolvencies (rates) New Personal Insolvencies (rates) New Personal Insolvencies (rates) New Personal Insolvencies (rates) New Personal Insolvencies (rates) New Personal Insolvencies (rates) New Personal Insolvencies.1 New Personal Insolvencies.1 New Personal Insolvencies.1 New Personal Insolvencies.1 New Personal Insolvencies.1 New Personal Insolvencies.1 New Personal Insolvencies.1
2 Barking and Dagenham 4.0 8.8 8.6 15.5 27.6 30.8 29.2 31.2 31.6 30.5 27.4992 25.0097 23.8526 19.8439 22.8094
3 Barnet 4.5 5.2 7.1 11.6 17.3 17.0 14.4 17.1 15.2 13.5 11.5559 10.3376 11.7798 9.66071 9.80568
4 Bexley 4.9 6.3 8.7 14.2 20.1 19.1 17.1 23.6 24.5 21.4 18.6963 16.2284 15.8963 12.0861 15.0842
5 Brent 2.8 4.3 6.1 7.8 17.9 17.5 14.0 16.0 17.7 17.2 12.5107 10.0933 10.3345 10.4419 10.4095
6 Bromley 4.0 4.5 7.0 13.1 20.7 17.8 17.0 19.2 19.2 14.5 15.2184 14.4676 14.131 9.93446 13.1313
In [947]:
# Resetting index
insolvency.reset_index(drop = True, inplace = True)
In [948]:
# Doing necessary transformations to make it consistent with other data frames     
insolvency.columns = ['Borough', '2002', ' 2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', ' 2012', '2013', '2014', '2015', '2016']
In [949]:
#Melting data frame for the future concating with other data frames
insolvency_new = pd.melt(insolvency, id_vars = ['Borough'], var_name = 'Year', value_name = 'Personal Insolvency (Rate)')
In [950]:
insolvency_new.head()
Out[950]:
Borough Year Personal Insolvency (Rate)
0 Barking and Dagenham 2002 4
1 Barnet 2002 4.5
2 Bexley 2002 4.9
3 Brent 2002 2.8
4 Bromley 2002 4
In [951]:
insolvency_new.isnull().sum
Out[951]:
<bound method DataFrame.sum of      Borough   Year  Personal Insolvency (Rate)
0      False  False                       False
1      False  False                       False
2      False  False                       False
3      False  False                       False
4      False  False                       False
5      False  False                       False
6      False  False                       False
7      False  False                       False
8      False  False                       False
9      False  False                       False
10     False  False                       False
11     False  False                       False
12     False  False                       False
13     False  False                       False
14     False  False                       False
15     False  False                       False
16     False  False                       False
17     False  False                       False
18     False  False                       False
19     False  False                       False
20     False  False                       False
21     False  False                       False
22     False  False                       False
23     False  False                       False
24     False  False                       False
25     False  False                       False
26     False  False                       False
27     False  False                       False
28     False  False                       False
29     False  False                       False
..       ...    ...                         ...
450    False  False                       False
451    False  False                       False
452    False  False                       False
453    False  False                       False
454    False  False                       False
455    False  False                       False
456    False  False                       False
457    False  False                       False
458    False  False                       False
459    False  False                       False
460    False  False                       False
461    False  False                       False
462    False  False                       False
463    False  False                       False
464    False  False                       False
465    False  False                       False
466    False  False                       False
467    False  False                       False
468    False  False                       False
469    False  False                       False
470    False  False                       False
471    False  False                       False
472    False  False                       False
473    False  False                       False
474    False  False                       False
475    False  False                       False
476    False  False                       False
477    False  False                       False
478    False  False                       False
479    False  False                       False

[480 rows x 3 columns]>
In [952]:
# Doing necessary transformations to make it consistent with other data frames     
bankruptcy = insolvency_df.iloc[2:34,[1,5,13,21,29,37,45,53,61,71,81,91,101,111,121,131]]
In [953]:
bankruptcy.head()
Out[953]:
Area New Bankruptcy Orders (rates) New Bankruptcy Orders (rates) New Bankruptcy Orders (rates) New Bankruptcy Orders (rates) New Bankruptcy Orders (rates) New Bankruptcy Orders (rates) New Bankruptcy Orders (rates) New Bankruptcy Orders (rates) New Bankruptcy Orders.1 New Bankruptcy Orders.1 New Bankruptcy Orders.1 New Bankruptcy Orders.1 New Bankruptcy Orders.1 New Bankruptcy Orders.1 New Bankruptcy Orders.1
2 Barking and Dagenham 3.2 6.7 7.0 10.7 18.2 18.2 17.9 16.6 11.4 10.3 7.04157 6.36211 4.45441 4.0959 4.83836
3 Barnet 3.4 3.8 5.3 8.4 9.7 10.4 9.7 10.3 7.7 5.9 4.95254 3.92969 3.6594 2.78476 2.46833
4 Bexley 4.0 5.3 7.0 10.6 10.7 9.8 9.4 11.9 9.8 7.1 5.7313 3.90582 2.54991 2.20236 2.49632
5 Brent 2.2 3.2 4.6 5.5 9.4 9.2 7.9 8.6 8.5 6.7 3.62152 3.59601 2.75587 2.00805 1.98655
6 Bromley 2.7 3.4 5.1 9.2 13.1 11.4 10.6 11.1 8.5 5.9 6.05463 4.08163 3.48272 2.53309 2.87863
In [954]:
# Resetting index
bankruptcy.reset_index(drop = True, inplace = True)
In [955]:
# Doing necessary transformations to make it consistent with other data frames     
bankruptcy.columns = ['Borough','2002', ' 2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016']
In [956]:
#Melting data frame for the future concating with other data frames
bankruptcy_new = pd.melt(bankruptcy, id_vars = ['Borough'], var_name = 'Year', value_name = 'Personal Bankruptcy (Rate)')
In [957]:
bankruptcy_new.head()
Out[957]:
Borough Year Personal Bankruptcy (Rate)
0 Barking and Dagenham 2002 3.2
1 Barnet 2002 3.4
2 Bexley 2002 4
3 Brent 2002 2.2
4 Bromley 2002 2.7

Bringing all datasets together

In [958]:
# Resetting index
violence.reset_index(inplace = True)
In [959]:
#Melting data frame for the future concating with other data frames
violence_new = pd.melt(violence, id_vars = ['Borough'], var_name = 'Year', value_name = 'Violence Rate')
In [960]:
# Years 1999,2000 and 2001 are removed to make it consistent with other data frames 
violence_amended_years = violence_new.iloc[96 :,:]
In [961]:
# Resetting index
violence_amended_years.reset_index(drop = True, inplace = True)
In [962]:
violence_amended_years.head(20)
Out[962]:
Borough Year Violence Rate
0 Barking and Dagenham Year 2002 26.401053
1 Barnet Year 2002 16.365519
2 Bexley Year 2002 16.223765
3 Brent Year 2002 23.333370
4 Bromley Year 2002 15.937007
5 Camden Year 2002 28.518620
6 Croydon Year 2002 22.783716
7 Ealing Year 2002 23.098415
8 Enfield Year 2002 19.921993
9 Greenwich Year 2002 31.118496
10 Hackney Year 2002 32.280848
11 Hammersmith and Fulham Year 2002 25.135328
12 Haringey Year 2002 24.261162
13 Harrow Year 2002 13.313805
14 Havering Year 2002 17.333618
15 Hillingdon Year 2002 20.168945
16 Hounslow Year 2002 32.781897
17 Islington Year 2002 35.238011
18 Kensington and Chelsea Year 2002 21.186984
19 Kingston upon Thames Year 2002 19.092233
In [963]:
violence_amended_years.shape
Out[963]:
(480, 3)
In [964]:
# Dropping columns that are not needed for concating
pubs_bars_amended = pubs_bars_new.drop(['Borough','Year'], axis = 'columns')
In [965]:
pubs_bars_amended.head(300)
Out[965]:
Proportion of pubs/bars
0 19.047619
1 21.951220
2 27.142857
3 24.731183
4 27.551020
5 25.108225
6 29.508197
7 28.301887
8 29.069767
9 32.857143
10 36.046512
11 26.732673
12 30.000000
13 23.880597
14 29.032258
15 30.434783
16 30.769231
17 35.099338
18 19.594595
19 29.850746
20 31.496063
21 30.434783
22 22.580645
23 30.769231
24 21.212121
25 31.372549
26 34.645669
27 27.272727
28 35.087719
29 30.158730
... ...
270 20.779221
271 21.000000
272 20.000000
273 27.096774
274 16.216216
275 21.126761
276 25.600000
277 17.808219
278 20.000000
279 19.354839
280 16.666667
281 26.804124
282 27.067669
283 19.354839
284 24.000000
285 23.076923
286 21.538462
287 18.181818
288 13.157895
289 14.843750
290 26.388889
291 19.791667
292 20.588235
293 21.428571
294 18.421053
295 19.230769
296 19.780220
297 23.287671
298 25.510204
299 20.869565

300 rows × 1 columns

In [966]:
clubs_new.head(300)
Out[966]:
Borough Year Proportion of clubs
0 Barking and Dagenham 2002 9.523810
1 Barnet 2002 4.065041
2 Bexley 2002 7.142857
3 Brent 2002 6.451613
4 Bromley 2002 7.142857
5 Camden 2002 3.463203
6 Croydon 2002 6.557377
7 Ealing 2002 4.716981
8 Enfield 2002 4.651163
9 Greenwich 2002 5.714286
10 Hackney 2002 3.488372
11 Hammersmith and Fulham 2002 4.950495
12 Haringey 2002 3.750000
13 Harrow 2002 7.462687
14 Havering 2002 6.451613
15 Hillingdon 2002 6.521739
16 Hounslow 2002 7.692308
17 Islington 2002 2.649007
18 Kensington and Chelsea 2002 3.378378
19 Kingston upon Thames 2002 5.970149
20 Lambeth 2002 3.149606
21 Lewisham 2002 4.347826
22 Merton 2002 3.225806
23 Newham 2002 6.153846
24 Redbridge 2002 6.060606
25 Richmond upon Thames 2002 2.941176
26 Southwark 2002 3.937008
27 Sutton 2002 3.636364
28 Tower Hamlets 2002 1.754386
29 Waltham Forest 2002 7.936508
... ... ... ...
270 Havering 2010 5.194805
271 Hillingdon 2010 5.000000
272 Hounslow 2010 5.000000
273 Islington 2010 2.580645
274 Kensington and Chelsea 2010 3.378378
275 Kingston upon Thames 2010 4.225352
276 Lambeth 2010 2.400000
277 Lewisham 2010 4.109589
278 Merton 2010 4.615385
279 Newham 2010 4.838710
280 Redbridge 2010 4.166667
281 Richmond upon Thames 2010 3.092784
282 Southwark 2010 2.255639
283 Sutton 2010 4.838710
284 Tower Hamlets 2010 2.400000
285 Waltham Forest 2010 4.615385
286 Wandsworth 2010 3.846154
287 Westminster 2010 4.752066
288 Barking and Dagenham 2011 5.263158
289 Barnet 2011 3.125000
290 Bexley 2011 5.555556
291 Brent 2011 4.166667
292 Bromley 2011 4.901961
293 Camden 2011 2.941176
294 Croydon 2011 5.263158
295 Ealing 2011 3.846154
296 Enfield 2011 4.395604
297 Greenwich 2011 4.109589
298 Hackney 2011 3.061224
299 Hammersmith and Fulham 2011 3.478261

300 rows × 3 columns

In [967]:
# Dropping columns that are not needed for concating
clubs_amended = clubs_new.drop(['Borough','Year'], axis = 'columns')
In [968]:
clubs_amended.head()
Out[968]:
Proportion of clubs
0 9.523810
1 4.065041
2 7.142857
3 6.451613
4 7.142857
In [969]:
# Dropping columns that are not needed for concating
job_seekers_amended = job_seekers_new.drop(['Borough','Year'], axis = 'columns')
In [970]:
job_seekers_amended.head()
Out[970]:
Job Seekers Allowance Claimants (Rate)
0 2.952248
1 2.509677
2 1.811202
3 3.780913
4 1.868097
In [971]:
# Dropping columns that are not needed for concating
immigration_amended = immigration_new.drop(['Borough','Year'], axis = 'columns')
In [972]:
immigration_amended.head()
Out[972]:
National Insurance Number Registrations (Immigration)(Rate)
0 20.253383
1 28.164072
2 6.935214
3 51.338043
4 8.303437
In [973]:
# Dropping columns that are not needed for concating
income_support_amended = income_support_new.drop(['Borough','Year'], axis = 'columns')
In [974]:
income_support_amended.head()
Out[974]:
Income Support Claimants (Rate)
0 15.503653
1 9.222074
2 7.831012
3 12.663241
4 7.616807
In [975]:
# Dropping columns that are not needed for concating
affordable_housing_amended = affordable_housing_new.drop(['Borough','Year'], axis = 'columns')
In [976]:
affordable_housing_amended.head()
Out[976]:
Affordable Housing Supply
0 204.0
1 231.0
2 262.0
3 328.0
4 267.0
In [977]:
# Dropping columns that are not needed for concating
House_Prices_to_Earnings_amended = House_Prices_to_Earnings_new.drop(['Borough','Year'], axis = 'columns')
In [978]:
House_Prices_to_Earnings_amended.head()
Out[978]:
Ratio of House Prices to Earnings
0 4.68
1 7.44
2 5.56
3 8.1
4 6.48
In [979]:
# Dropping columns that are not needed for concating
insolvency_amended = insolvency_new.drop(['Borough','Year'], axis = 'columns')
In [980]:
insolvency_amended.head()
Out[980]:
Personal Insolvency (Rate)
0 4
1 4.5
2 4.9
3 2.8
4 4
In [981]:
# Dropping columns that are not needed for concating
bankruptcy_amended = bankruptcy_new.drop(['Borough','Year'], axis = 'columns')

Combined Data Frame

In [982]:
# Concating all data frames together to run a multi regression analysis
combined = pd.concat([violence_amended_years, job_seekers_amended, immigration_amended, income_support_amended,affordable_housing_amended, House_Prices_to_Earnings_amended, insolvency_amended, bankruptcy_amended,pubs_bars_amended,clubs_amended], axis = 1 )
In [983]:
combined.head(300)
Out[983]:
Borough Year Violence Rate Job Seekers Allowance Claimants (Rate) National Insurance Number Registrations (Immigration)(Rate) Income Support Claimants (Rate) Affordable Housing Supply Ratio of House Prices to Earnings Personal Insolvency (Rate) Personal Bankruptcy (Rate) Proportion of pubs/bars Proportion of clubs
0 Barking and Dagenham Year 2002 26.401053 2.952248 20.253383 15.503653 204.0 4.68 4 3.2 19.047619 9.523810
1 Barnet Year 2002 16.365519 2.509677 28.164072 9.222074 231.0 7.44 4.5 3.4 21.951220 4.065041
2 Bexley Year 2002 16.223765 1.811202 6.935214 7.831012 262.0 5.56 4.9 4 27.142857 7.142857
3 Brent Year 2002 23.333370 3.780913 51.338043 12.663241 328.0 8.1 2.8 2.2 24.731183 6.451613
4 Bromley Year 2002 15.937007 1.868097 8.303437 7.616807 267.0 6.48 4 2.7 27.551020 7.142857
5 Camden Year 2002 28.518620 3.661282 38.245919 14.042012 131.0 9.72 3.6 3 25.108225 3.463203
6 Croydon Year 2002 22.783716 2.672229 18.270833 9.773511 511.0 5.82 4.4 3.3 29.508197 6.557377
7 Ealing Year 2002 23.098415 2.704908 34.843335 10.619675 257.0 7.08 3.1 2.3 28.301887 4.716981
8 Enfield Year 2002 19.921993 2.875181 21.936058 12.014928 493.0 6.03 4.2 3.1 29.069767 4.651163
9 Greenwich Year 2002 31.118496 3.537161 20.987461 13.991640 325.0 5.99 5.5 4.6 32.857143 5.714286
10 Hackney Year 2002 32.280848 5.132922 34.668635 19.737025 503.0 7.37 4.2 3.7 36.046512 3.488372
11 Hammersmith and Fulham Year 2002 25.135328 3.395747 50.066156 11.553728 136.0 8.55 4.5 3.5 26.732673 4.950495
12 Haringey Year 2002 24.261162 4.603853 38.274967 15.673485 233.0 7.03 4 3.1 30.000000 3.750000
13 Harrow Year 2002 13.313805 2.004209 22.586182 8.789873 176.0 7.59 3.8 3.1 23.880597 7.462687
14 Havering Year 2002 17.333618 1.561857 5.315520 8.662592 96.0 5.63 3.4 2.8 29.032258 6.451613
15 Hillingdon Year 2002 20.168945 1.852870 15.957151 8.813543 208.0 6.47 4 2.5 30.434783 6.521739
16 Hounslow Year 2002 32.781897 1.908193 37.485567 10.559598 190.0 7 3.7 2.6 30.769231 7.692308
17 Islington Year 2002 35.238011 4.488591 34.517385 17.653462 246.0 8.71 5.3 4.6 35.099338 2.649007
18 Kensington and Chelsea Year 2002 21.186984 2.428878 45.835953 9.354960 75.0 10.68 4.8 4.4 19.594595 3.378378
19 Kingston upon Thames Year 2002 19.092233 1.585016 17.966820 5.663454 89.0 7.16 5.2 4.7 29.850746 5.970149
20 Lambeth Year 2002 33.515113 4.892338 35.505864 13.605098 304.0 8.02 4.7 4.3 31.496063 3.149606
21 Lewisham Year 2002 22.506327 4.098478 26.293732 13.083715 403.0 5.85 3.2 2.7 30.434783 4.347826
22 Merton Year 2002 18.156916 2.104156 30.682202 7.132536 135.0 6.81 4 2.8 22.580645 3.225806
23 Newham Year 2002 31.713948 4.188469 51.380634 16.895646 383.0 6.47 3.9 3.2 30.769231 6.153846
24 Redbridge Year 2002 18.688107 2.302662 20.179527 9.660078 259.0 6.22 3.4 2.8 21.212121 6.060606
25 Richmond upon Thames Year 2002 14.517701 1.586414 17.181584 5.436786 32.0 7.26 5.6 4.6 31.372549 2.941176
26 Southwark Year 2002 32.079573 4.769097 33.647743 14.699659 372.0 7.54 3.9 3.5 34.645669 3.937008
27 Sutton Year 2002 16.908945 1.613217 10.923072 7.197175 283.0 6.29 5.1 4.2 27.272727 3.636364
28 Tower Hamlets Year 2002 36.410006 5.252472 45.348975 17.457472 593.0 7.81 6.9 6.6 35.087719 1.754386
29 Waltham Forest Year 2002 24.251931 3.752919 33.899984 12.832197 409.0 5.69 2.9 1.9 30.158730 7.936508
... ... ... ... ... ... ... ... ... ... ... ... ...
270 Havering Year 2010 15.662436 3.118212 9.556095 3.903852 174.0 7.09 20.9 9.8 20.779221 5.194805
271 Hillingdon Year 2010 21.093649 2.746973 37.445849 4.326502 383.0 8.26 19 7.6 21.000000 5.000000
272 Hounslow Year 2010 22.793657 2.931098 71.557939 4.585931 681.0 8.54 22.7 10 20.000000 5.000000
273 Islington Year 2010 29.550940 4.559540 56.527168 7.979913 98.0 10.95 19.8 8.4 27.096774 2.580645
274 Kensington and Chelsea Year 2010 18.409228 2.877257 63.640272 4.737340 29.0 15.93 16.2 8.9 16.216216 3.378378
275 Kingston upon Thames Year 2010 14.056276 1.581155 27.747622 2.677241 84.0 8.78 16.3 7 21.126761 4.225352
276 Lambeth Year 2010 25.822274 4.928736 54.812312 6.325895 1014.0 8.97 15.8 7 25.600000 2.400000
277 Lewisham Year 2010 25.149986 4.807052 45.839889 6.883087 528.0 7.63 20.1 7.2 17.808219 4.109589
278 Merton Year 2010 14.417283 2.276764 48.055503 3.319695 116.0 8.61 15.9 7.4 20.000000 4.615385
279 Newham Year 2010 23.418045 6.188660 136.408993 6.149410 731.0 8 27.4 10.9 19.354839 4.838710
280 Redbridge Year 2010 15.027918 3.468344 54.184675 4.410768 252.0 7.19 18.9 10.6 16.666667 4.166667
281 Richmond upon Thames Year 2010 11.475867 1.555512 19.121422 2.391177 45.0 10.35 13.8 5.6 26.804124 3.092784
282 Southwark Year 2010 27.141030 4.521741 60.754730 6.586243 828.0 9.15 15.4 8 27.067669 2.255639
283 Sutton Year 2010 12.006064 2.474939 15.239810 3.641157 235.0 7.36 25.7 8 19.354839 4.838710
284 Tower Hamlets Year 2010 25.362144 5.545117 83.534422 6.698695 1268.0 8.88 19.1 9.8 24.000000 2.400000
285 Waltham Forest Year 2010 21.475617 5.318004 77.593082 5.703433 490.0 7.81 27.5 12.4 23.076923 4.615385
286 Wandsworth Year 2010 16.082876 2.741116 48.814863 3.886929 239.0 9.79 12.7 5.7 21.538462 3.846154
287 Westminster Year 2010 40.246424 2.523473 78.185922 6.069312 577.0 12.11 12.9 7.1 18.181818 4.752066
288 Barking and Dagenham Year 2011 22.023323 6.142692 39.407213 7.074165 384.0 6.56 30.5 10.3 13.157895 5.263158
289 Barnet Year 2011 11.618345 2.893046 37.562874 3.619494 666.0 10.08 13.5 5.9 14.843750 3.125000
290 Bexley Year 2011 11.689450 3.135072 10.373500 3.588744 357.0 6.77 21.4 7.1 26.388889 5.555556
291 Brent Year 2011 23.337443 4.318946 70.045453 4.586859 852.0 11.19 17.2 6.7 19.791667 4.166667
292 Bromley Year 2011 14.728517 2.993951 9.199688 3.373894 394.0 8.06 14.5 5.9 20.588235 4.901961
293 Camden Year 2011 25.194582 3.414586 53.063157 5.470814 236.0 13.96 11.6 5.1 21.428571 2.941176
294 Croydon Year 2011 18.313392 4.425973 24.982250 4.712457 583.0 7.27 21.1 7.5 18.421053 5.263158
295 Ealing Year 2011 21.036562 3.752674 52.836970 4.201968 673.0 9.42 13.2 5.9 19.230769 3.846154
296 Enfield Year 2011 13.177887 5.119604 25.382870 5.906859 893.0 8.26 18 6.9 19.780220 4.395604
297 Greenwich Year 2011 20.772419 4.559690 36.517824 5.682342 503.0 8.17 26.9 6.5 23.287671 4.109589
298 Hackney Year 2011 21.842205 6.008882 42.650180 6.552089 1033.0 9.21 20.9 8.3 25.510204 3.061224
299 Hammersmith and Fulham Year 2011 23.815396 3.639699 61.596037 4.681710 169.0 13.11 14.7 5.7 20.869565 3.478261

300 rows × 12 columns

In [984]:
# Checking the shape of a new data frame
combined.shape
Out[984]:
(480, 12)
In [985]:
combined.isnull().sum
Out[985]:
<bound method DataFrame.sum of      Borough   Year  Violence Rate  Job Seekers Allowance Claimants (Rate)  \
0      False  False          False                                   False   
1      False  False          False                                   False   
2      False  False          False                                   False   
3      False  False          False                                   False   
4      False  False          False                                   False   
5      False  False          False                                   False   
6      False  False          False                                   False   
7      False  False          False                                   False   
8      False  False          False                                   False   
9      False  False          False                                   False   
10     False  False          False                                   False   
11     False  False          False                                   False   
12     False  False          False                                   False   
13     False  False          False                                   False   
14     False  False          False                                   False   
15     False  False          False                                   False   
16     False  False          False                                   False   
17     False  False          False                                   False   
18     False  False          False                                   False   
19     False  False          False                                   False   
20     False  False          False                                   False   
21     False  False          False                                   False   
22     False  False          False                                   False   
23     False  False          False                                   False   
24     False  False          False                                   False   
25     False  False          False                                   False   
26     False  False          False                                   False   
27     False  False          False                                   False   
28     False  False          False                                   False   
29     False  False          False                                   False   
..       ...    ...            ...                                     ...   
450    False  False          False                                   False   
451    False  False          False                                   False   
452    False  False          False                                   False   
453    False  False          False                                   False   
454    False  False          False                                   False   
455    False  False          False                                   False   
456    False  False          False                                   False   
457    False  False          False                                   False   
458    False  False          False                                   False   
459    False  False          False                                   False   
460    False  False          False                                   False   
461    False  False          False                                   False   
462    False  False          False                                   False   
463    False  False          False                                   False   
464    False  False          False                                   False   
465    False  False          False                                   False   
466    False  False          False                                   False   
467    False  False          False                                   False   
468    False  False          False                                   False   
469    False  False          False                                   False   
470    False  False          False                                   False   
471    False  False          False                                   False   
472    False  False          False                                   False   
473    False  False          False                                   False   
474    False  False          False                                   False   
475    False  False          False                                   False   
476    False  False          False                                   False   
477    False  False          False                                   False   
478    False  False          False                                   False   
479    False  False          False                                   False   

     National Insurance Number Registrations (Immigration)(Rate)  \
0                                                False             
1                                                False             
2                                                False             
3                                                False             
4                                                False             
5                                                False             
6                                                False             
7                                                False             
8                                                False             
9                                                False             
10                                               False             
11                                               False             
12                                               False             
13                                               False             
14                                               False             
15                                               False             
16                                               False             
17                                               False             
18                                               False             
19                                               False             
20                                               False             
21                                               False             
22                                               False             
23                                               False             
24                                               False             
25                                               False             
26                                               False             
27                                               False             
28                                               False             
29                                               False             
..                                                 ...             
450                                              False             
451                                              False             
452                                              False             
453                                              False             
454                                              False             
455                                              False             
456                                              False             
457                                              False             
458                                              False             
459                                              False             
460                                              False             
461                                              False             
462                                              False             
463                                              False             
464                                              False             
465                                              False             
466                                              False             
467                                              False             
468                                              False             
469                                              False             
470                                              False             
471                                              False             
472                                              False             
473                                              False             
474                                              False             
475                                              False             
476                                              False             
477                                              False             
478                                              False             
479                                              False             

     Income Support Claimants (Rate)  Affordable Housing Supply  \
0                              False                      False   
1                              False                      False   
2                              False                      False   
3                              False                      False   
4                              False                      False   
5                              False                      False   
6                              False                      False   
7                              False                      False   
8                              False                      False   
9                              False                      False   
10                             False                      False   
11                             False                      False   
12                             False                      False   
13                             False                      False   
14                             False                      False   
15                             False                      False   
16                             False                      False   
17                             False                      False   
18                             False                      False   
19                             False                      False   
20                             False                      False   
21                             False                      False   
22                             False                      False   
23                             False                      False   
24                             False                      False   
25                             False                      False   
26                             False                      False   
27                             False                      False   
28                             False                      False   
29                             False                      False   
..                               ...                        ...   
450                            False                      False   
451                            False                      False   
452                            False                      False   
453                            False                      False   
454                            False                      False   
455                            False                      False   
456                            False                      False   
457                            False                      False   
458                            False                      False   
459                            False                      False   
460                            False                      False   
461                            False                      False   
462                            False                      False   
463                            False                      False   
464                            False                      False   
465                            False                      False   
466                            False                      False   
467                            False                      False   
468                            False                      False   
469                            False                      False   
470                            False                      False   
471                            False                      False   
472                            False                      False   
473                            False                      False   
474                            False                      False   
475                            False                      False   
476                            False                      False   
477                            False                      False   
478                            False                      False   
479                            False                      False   

     Ratio of House Prices to Earnings  Personal Insolvency (Rate)  \
0                                False                       False   
1                                False                       False   
2                                False                       False   
3                                False                       False   
4                                False                       False   
5                                False                       False   
6                                False                       False   
7                                False                       False   
8                                False                       False   
9                                False                       False   
10                               False                       False   
11                               False                       False   
12                               False                       False   
13                               False                       False   
14                               False                       False   
15                               False                       False   
16                               False                       False   
17                               False                       False   
18                               False                       False   
19                               False                       False   
20                               False                       False   
21                               False                       False   
22                               False                       False   
23                               False                       False   
24                               False                       False   
25                               False                       False   
26                               False                       False   
27                               False                       False   
28                               False                       False   
29                               False                       False   
..                                 ...                         ...   
450                              False                       False   
451                              False                       False   
452                              False                       False   
453                              False                       False   
454                              False                       False   
455                              False                       False   
456                              False                       False   
457                              False                       False   
458                              False                       False   
459                              False                       False   
460                              False                       False   
461                              False                       False   
462                              False                       False   
463                              False                       False   
464                              False                       False   
465                              False                       False   
466                              False                       False   
467                              False                       False   
468                              False                       False   
469                              False                       False   
470                              False                       False   
471                              False                       False   
472                              False                       False   
473                              False                       False   
474                              False                       False   
475                              False                       False   
476                              False                       False   
477                              False                       False   
478                              False                       False   
479                              False                       False   

     Personal Bankruptcy (Rate)  Proportion of pubs/bars  Proportion of clubs  
0                         False                    False                False  
1                         False                    False                False  
2                         False                    False                False  
3                         False                    False                False  
4                         False                    False                False  
5                         False                    False                False  
6                         False                    False                False  
7                         False                    False                False  
8                         False                    False                False  
9                         False                    False                False  
10                        False                    False                False  
11                        False                    False                False  
12                        False                    False                False  
13                        False                    False                False  
14                        False                    False                False  
15                        False                    False                False  
16                        False                    False                False  
17                        False                    False                False  
18                        False                    False                False  
19                        False                    False                False  
20                        False                    False                False  
21                        False                    False                False  
22                        False                    False                False  
23                        False                    False                False  
24                        False                    False                False  
25                        False                    False                False  
26                        False                    False                False  
27                        False                    False                False  
28                        False                    False                False  
29                        False                    False                False  
..                          ...                      ...                  ...  
450                       False                    False                False  
451                       False                    False                False  
452                       False                    False                False  
453                       False                    False                False  
454                       False                    False                False  
455                       False                    False                False  
456                       False                    False                False  
457                       False                    False                False  
458                       False                    False                False  
459                       False                    False                False  
460                       False                    False                False  
461                       False                    False                False  
462                       False                    False                False  
463                       False                    False                False  
464                       False                    False                False  
465                       False                    False                False  
466                       False                    False                False  
467                       False                    False                False  
468                       False                    False                False  
469                       False                    False                False  
470                       False                    False                False  
471                       False                    False                False  
472                       False                    False                False  
473                       False                    False                False  
474                       False                    False                False  
475                       False                    False                False  
476                       False                    False                False  
477                       False                    False                False  
478                       False                    False                False  
479                       False                    False                False  

[480 rows x 12 columns]>
In [986]:
# checking data types
combined.dtypes
Out[986]:
Borough                                                         object
Year                                                            object
Violence Rate                                                  float64
Job Seekers Allowance Claimants (Rate)                         float64
National Insurance Number Registrations (Immigration)(Rate)    float64
Income Support Claimants (Rate)                                float64
Affordable Housing Supply                                      float64
Ratio of House Prices to Earnings                               object
Personal Insolvency (Rate)                                      object
Personal Bankruptcy (Rate)                                      object
Proportion of pubs/bars                                        float64
Proportion of clubs                                            float64
dtype: object
In [987]:
# Converting object type to float so regression models can be build
combined = combined.astype({'Ratio of House Prices to Earnings': 'float64'})
combined = combined.astype({'Personal Insolvency (Rate)': 'float64'})
combined = combined.astype({'Personal Bankruptcy (Rate)': 'float64'})

Linear Regression Analysis (Ordinary Least-Squares Regression)

To test the relationship between the violence rate and the socioeconomic factors we are going to build multi regression models. As observed before the Westminster borough was very different to all other boroughs in terms of a very high crime rate. So first of all we will test the models with all the boroughs and then we will exclude Westminster to see if it has any effect on model performance.

In [988]:
# Checking the varibles for multicollinearity.In case if we find a strong correlation between some predictors it would be difficult to distinguish the effect of those predictors on models
corr = combined.corr()
display(corr)
sns.heatmap(corr, cmap = 'RdBu')
Violence Rate Job Seekers Allowance Claimants (Rate) National Insurance Number Registrations (Immigration)(Rate) Income Support Claimants (Rate) Affordable Housing Supply Ratio of House Prices to Earnings Personal Insolvency (Rate) Personal Bankruptcy (Rate) Proportion of pubs/bars Proportion of clubs
Violence Rate 1.000000 0.317063 0.437966 0.490313 0.240855 0.202083 -0.058031 0.155861 0.230532 -0.094686
Job Seekers Allowance Claimants (Rate) 0.317063 1.000000 0.312494 0.585028 0.495870 -0.308302 0.336927 0.409222 0.282911 -0.033212
National Insurance Number Registrations (Immigration)(Rate) 0.437966 0.312494 1.000000 0.051019 0.320678 0.406339 0.157725 0.107766 -0.271592 -0.329274
Income Support Claimants (Rate) 0.490313 0.585028 0.051019 1.000000 0.247866 -0.427692 -0.061110 0.435720 0.620484 0.270231
Affordable Housing Supply 0.240855 0.495870 0.320678 0.247866 1.000000 -0.177251 0.314050 0.271199 0.155479 -0.139953
Ratio of House Prices to Earnings 0.202083 -0.308302 0.406339 -0.427692 -0.177251 1.000000 -0.151475 -0.299712 -0.372643 -0.480656
Personal Insolvency (Rate) -0.058031 0.336927 0.157725 -0.061110 0.314050 -0.151475 1.000000 0.687010 -0.209028 -0.009169
Personal Bankruptcy (Rate) 0.155861 0.409222 0.107766 0.435720 0.271199 -0.299712 0.687010 1.000000 0.237531 0.176861
Proportion of pubs/bars 0.230532 0.282911 -0.271592 0.620484 0.155479 -0.372643 -0.209028 0.237531 1.000000 0.146306
Proportion of clubs -0.094686 -0.033212 -0.329274 0.270231 -0.139953 -0.480656 -0.009169 0.176861 0.146306 1.000000
Out[988]:
<matplotlib.axes._subplots.AxesSubplot at 0x20e4002a048>

Observations: As we can see above there is no strong correlation between the predictor factors so no variables need to be removed on that basis. Also it is seen that even though some of the predictors show some correlation with violence rate majority of the predictors demonstrate a very week correlation.

In [989]:
# Plotting the correlation
pd.plotting.scatter_matrix (combined, alpha = 1 , figsize = (30,30), grid = True)
plt.show()
In [990]:
# Describing the data
combined.describe()
Out[990]:
Violence Rate Job Seekers Allowance Claimants (Rate) National Insurance Number Registrations (Immigration)(Rate) Income Support Claimants (Rate) Affordable Housing Supply Ratio of House Prices to Earnings Personal Insolvency (Rate) Personal Bankruptcy (Rate) Proportion of pubs/bars Proportion of clubs
count 480.000000 480.000000 480.000000 480.000000 480.000000 480.000000 480.000000 480.000000 480.000000 480.000000
mean 22.764695 2.858382 43.875402 5.390572 368.829167 9.634604 14.141528 6.817956 21.323215 4.086258
std 7.174613 1.293782 23.061520 3.362201 286.718195 3.109554 5.944237 3.741563 5.203594 1.505440
min 10.132469 0.347571 5.315520 0.693382 7.000000 4.680000 2.700000 1.774448 7.692308 1.081081
25% 17.337771 1.834305 27.181719 2.640038 168.750000 7.767500 10.000000 3.600000 17.636703 2.941176
50% 22.169890 2.646903 42.161018 4.734155 303.500000 8.725000 14.094072 5.800000 21.118936 4.054054
75% 27.436358 3.774764 58.364719 7.606769 483.750000 10.640000 18.125000 9.600000 24.712207 5.000000
max 48.723649 6.340935 145.287680 19.737025 1995.000000 30.490000 31.600000 18.300000 36.046512 9.756098
In [991]:
# Test normality for predictor factors (columns 2:11)

for col in range(2,12):
  
  print(combined.columns[col])
  X = combined.iloc[:,col]
 

  # Perform the Shapiro-Wilk normality test. Lower p-value implies more closer to normality.
  print(stats.shapiro(X)[1])
Violence Rate
2.620166128508572e-08
Job Seekers Allowance Claimants (Rate)
4.546798770821425e-10
National Insurance Number Registrations (Immigration)(Rate)
5.03999952883305e-09
Income Support Claimants (Rate)
6.770427182214642e-14
Affordable Housing Supply
3.102623292011135e-20
Ratio of House Prices to Earnings
2.6300823739599365e-24
Personal Insolvency (Rate)
0.0002898563107009977
Personal Bankruptcy (Rate)
1.7313927730216137e-14
Proportion of pubs/bars
0.1662525236606598
Proportion of clubs
1.8488352182544077e-08

Observations: In the cell above, we test the normality of each variable using the Shapiro-Wilk test for normality. A lower P-values (generally <0.05) imply that the variables are not normally distributed, which is the case with our chosen factors. However it doesn't mean that we cannot build valid models as sometimes the relevant variables may not exhibit normality. What is important after running a model is to check for normality the residuals which will be done below.

In [1003]:
# Semi-saturated model(6 out of 9 predictors):firstly lets test the model not including all the variables so later we can see if adding a predictor would have influence on the model  

X = combined[ ['Job Seekers Allowance Claimants (Rate)',
       'National Insurance Number Registrations (Immigration)(Rate)',
       'Income Support Claimants (Rate)', 'Affordable Housing Supply',  'Proportion of pubs/bars','Proportion of clubs']]
X = sm.add_constant(X)
y = combined.iloc[:,2]

model = sm.OLS(y, X).fit()
print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:          Violence Rate   R-squared:                       0.440
Model:                            OLS   Adj. R-squared:                  0.433
Method:                 Least Squares   F-statistic:                     61.95
Date:                Sat, 14 Dec 2019   Prob (F-statistic):           1.52e-56
Time:                        13:07:57   Log-Likelihood:                -1487.3
No. Observations:                 480   AIC:                             2989.
Df Residuals:                     473   BIC:                             3018.
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
===============================================================================================================================
                                                                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------------------------------------------
const                                                          13.1544      1.807      7.279      0.000       9.603      16.706
Job Seekers Allowance Claimants (Rate)                         -1.0032      0.273     -3.674      0.000      -1.540      -0.467
National Insurance Number Registrations (Immigration)(Rate)     0.1369      0.013     10.210      0.000       0.111       0.163
Income Support Claimants (Rate)                                 1.1834      0.121      9.784      0.000       0.946       1.421
Affordable Housing Supply                                       0.0006      0.001      0.622      0.534      -0.001       0.003
Proportion of pubs/bars                                         0.0959      0.068      1.420      0.156      -0.037       0.229
Proportion of clubs                                            -0.5352      0.188     -2.849      0.005      -0.904      -0.166
==============================================================================
Omnibus:                       46.776   Durbin-Watson:                   1.457
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               64.163
Skew:                           0.715   Prob(JB):                     1.17e-14
Kurtosis:                       4.077   Cond. No.                     3.45e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.45e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
In [1012]:
# Visualising residual distribution 
mu = np.mean(model.resid)
sigma = np.std(model.resid)
pdf = stats.norm.pdf(sorted(model.resid), mu, sigma)
plt.hist(model.resid, bins=50, normed=True)
plt.plot(sorted(model.resid), pdf, linewidth=2)
plt.show()
C:\Users\dlbor\Anaconda3\lib\site-packages\ipykernel_launcher.py:4: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  after removing the cwd from sys.path.

Observations: In the above model, the following factors are highly significant (< 0.05) with strong evidence of their impact on the violence rate: 'Job Seekers Allowance Claimants (Rate) National Insurance Number Registrations (Immigration)(Rate) Income Support Claimants (Rate) Proportion of clubs' After the fitting of the model, the following two variables where shown to be insignificant in explaining the variance in violent crime rates: 'Proportion of pubs/bars ', 'Affordable Housing Supply'. However it is important to mention that we cannot conclude that those two factors are always not significat for determinig violence rate. It might be the case that they are not significat in this particular model because of the presence of other factors such as 'tional Insurance Number Registrations (Immigration)(Rate)'. The fact that R-sq is heigher than 40 % means that more than 40% of variation in violent crime rate can be explained by the model, which is impressive given that not all predictors are included. We can also observe that residuals are quite well normally distributed which is a good sign that fitted model is valid.

In [1013]:
# Fully saturated model. All predictor factors


X = combined[ ['Job Seekers Allowance Claimants (Rate)',
       'National Insurance Number Registrations (Immigration)(Rate)',
       'Income Support Claimants (Rate)', 'Affordable Housing Supply',  'Proportion of pubs/bars','Proportion of clubs', 'Ratio of House Prices to Earnings',
       'Personal Insolvency (Rate)', 'Personal Bankruptcy (Rate)']]
X = sm.add_constant(X)
y = combined.iloc[:,2]

model1 = sm.OLS(y, X).fit()
print(model1.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:          Violence Rate   R-squared:                       0.517
Model:                            OLS   Adj. R-squared:                  0.508
Method:                 Least Squares   F-statistic:                     55.89
Date:                Sat, 14 Dec 2019   Prob (F-statistic):           9.78e-69
Time:                        13:28:18   Log-Likelihood:                -1451.8
No. Observations:                 480   AIC:                             2924.
Df Residuals:                     470   BIC:                             2965.
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
===============================================================================================================================
                                                                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------------------------------------------
const                                                          -0.5534      2.464     -0.225      0.822      -5.395       4.289
Job Seekers Allowance Claimants (Rate)                         -0.6342      0.290     -2.188      0.029      -1.204      -0.065
National Insurance Number Registrations (Immigration)(Rate)     0.0879      0.014      6.157      0.000       0.060       0.116
Income Support Claimants (Rate)                                 1.4943      0.134     11.161      0.000       1.231       1.757
Affordable Housing Supply                                       0.0022      0.001      2.194      0.029       0.000       0.004
Proportion of pubs/bars                                         0.1452      0.067      2.172      0.030       0.014       0.277
Proportion of clubs                                             0.0965      0.190      0.507      0.612      -0.277       0.470
Ratio of House Prices to Earnings                               0.8931      0.109      8.223      0.000       0.680       1.106
Personal Insolvency (Rate)                                      0.1819      0.079      2.289      0.023       0.026       0.338
Personal Bankruptcy (Rate)                                     -0.3315      0.122     -2.709      0.007      -0.572      -0.091
==============================================================================
Omnibus:                       32.629   Durbin-Watson:                   1.480
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               43.756
Skew:                           0.545   Prob(JB):                     3.15e-10
Kurtosis:                       3.999   Cond. No.                     5.04e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.04e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
In [1016]:
# Visualising residual distribution 
mu1 = np.mean(model1.resid)
sigma1 = np.std(model1.resid)
pdf1 = stats.norm.pdf(sorted(model1.resid), mu1, sigma1)
plt.hist(model1.resid, bins=50, normed=True)
plt.plot(sorted(model1.resid), pdf1, linewidth=2)
plt.show()
C:\Users\dlbor\Anaconda3\lib\site-packages\ipykernel_launcher.py:4: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  after removing the cwd from sys.path.

Observations: It is interesting to see that some of the factors that before were shown to be not significant for violence in fully saturated model became significant whereas some predictors that were significant became insignificant (e.g. proportion of clubs). This proves the point that significance of the predictor sometimes depends on the presence of other predictors in the model. Since more predictors are included in the above model it was expected for R-sq to be higher. In this case R-sq shows that we can explain over half the variance in violent crime rates using the listed factors above. However Adj. R-sq ( > 50 %) has also gone up, which means that additional factors improves the performance of the model. As in the other model we can observe that residuals are quite well normally distributed which means that we can draw a valid conclusion from it.

Now Lets remove Westminster as 'outlier' from our analysis to see if it has any effect on the regression models

In [994]:
# Creating new data frame without Westminster
combined_w = combined[combined['Borough'] != 'Westminster']
In [995]:
# Converting object type to float so regression models can be build
combined_w = combined_w.astype({'Ratio of House Prices to Earnings': 'float64'})
combined_w = combined_w.astype({'Personal Insolvency (Rate)': 'float64'})
combined_w = combined_w.astype({'Personal Bankruptcy (Rate)': 'float64'})
In [996]:
combined_w.head(33)
Out[996]:
Borough Year Violence Rate Job Seekers Allowance Claimants (Rate) National Insurance Number Registrations (Immigration)(Rate) Income Support Claimants (Rate) Affordable Housing Supply Ratio of House Prices to Earnings Personal Insolvency (Rate) Personal Bankruptcy (Rate) Proportion of pubs/bars Proportion of clubs
0 Barking and Dagenham Year 2002 26.401053 2.952248 20.253383 15.503653 204.0 4.68 4.0 3.2 19.047619 9.523810
1 Barnet Year 2002 16.365519 2.509677 28.164072 9.222074 231.0 7.44 4.5 3.4 21.951220 4.065041
2 Bexley Year 2002 16.223765 1.811202 6.935214 7.831012 262.0 5.56 4.9 4.0 27.142857 7.142857
3 Brent Year 2002 23.333370 3.780913 51.338043 12.663241 328.0 8.10 2.8 2.2 24.731183 6.451613
4 Bromley Year 2002 15.937007 1.868097 8.303437 7.616807 267.0 6.48 4.0 2.7 27.551020 7.142857
5 Camden Year 2002 28.518620 3.661282 38.245919 14.042012 131.0 9.72 3.6 3.0 25.108225 3.463203
6 Croydon Year 2002 22.783716 2.672229 18.270833 9.773511 511.0 5.82 4.4 3.3 29.508197 6.557377
7 Ealing Year 2002 23.098415 2.704908 34.843335 10.619675 257.0 7.08 3.1 2.3 28.301887 4.716981
8 Enfield Year 2002 19.921993 2.875181 21.936058 12.014928 493.0 6.03 4.2 3.1 29.069767 4.651163
9 Greenwich Year 2002 31.118496 3.537161 20.987461 13.991640 325.0 5.99 5.5 4.6 32.857143 5.714286
10 Hackney Year 2002 32.280848 5.132922 34.668635 19.737025 503.0 7.37 4.2 3.7 36.046512 3.488372
11 Hammersmith and Fulham Year 2002 25.135328 3.395747 50.066156 11.553728 136.0 8.55 4.5 3.5 26.732673 4.950495
12 Haringey Year 2002 24.261162 4.603853 38.274967 15.673485 233.0 7.03 4.0 3.1 30.000000 3.750000
13 Harrow Year 2002 13.313805 2.004209 22.586182 8.789873 176.0 7.59 3.8 3.1 23.880597 7.462687
14 Havering Year 2002 17.333618 1.561857 5.315520 8.662592 96.0 5.63 3.4 2.8 29.032258 6.451613
15 Hillingdon Year 2002 20.168945 1.852870 15.957151 8.813543 208.0 6.47 4.0 2.5 30.434783 6.521739
16 Hounslow Year 2002 32.781897 1.908193 37.485567 10.559598 190.0 7.00 3.7 2.6 30.769231 7.692308
17 Islington Year 2002 35.238011 4.488591 34.517385 17.653462 246.0 8.71 5.3 4.6 35.099338 2.649007
18 Kensington and Chelsea Year 2002 21.186984 2.428878 45.835953 9.354960 75.0 10.68 4.8 4.4 19.594595 3.378378
19 Kingston upon Thames Year 2002 19.092233 1.585016 17.966820 5.663454 89.0 7.16 5.2 4.7 29.850746 5.970149
20 Lambeth Year 2002 33.515113 4.892338 35.505864 13.605098 304.0 8.02 4.7 4.3 31.496063 3.149606
21 Lewisham Year 2002 22.506327 4.098478 26.293732 13.083715 403.0 5.85 3.2 2.7 30.434783 4.347826
22 Merton Year 2002 18.156916 2.104156 30.682202 7.132536 135.0 6.81 4.0 2.8 22.580645 3.225806
23 Newham Year 2002 31.713948 4.188469 51.380634 16.895646 383.0 6.47 3.9 3.2 30.769231 6.153846
24 Redbridge Year 2002 18.688107 2.302662 20.179527 9.660078 259.0 6.22 3.4 2.8 21.212121 6.060606
25 Richmond upon Thames Year 2002 14.517701 1.586414 17.181584 5.436786 32.0 7.26 5.6 4.6 31.372549 2.941176
26 Southwark Year 2002 32.079573 4.769097 33.647743 14.699659 372.0 7.54 3.9 3.5 34.645669 3.937008
27 Sutton Year 2002 16.908945 1.613217 10.923072 7.197175 283.0 6.29 5.1 4.2 27.272727 3.636364
28 Tower Hamlets Year 2002 36.410006 5.252472 45.348975 17.457472 593.0 7.81 6.9 6.6 35.087719 1.754386
29 Waltham Forest Year 2002 24.251931 3.752919 33.899984 12.832197 409.0 5.69 2.9 1.9 30.158730 7.936508
30 Wandsworth Year 2002 19.474969 2.674663 36.171274 8.936432 174.0 7.52 2.7 2.2 24.409449 5.511811
32 Barking and Dagenham Year 2003 29.492810 3.224947 22.435468 10.485831 317.0 6.22 8.8 6.7 20.930233 9.302326
33 Barnet Year 2003 18.076333 2.381042 29.635337 5.497830 243.0 8.40 5.2 3.8 20.634921 3.968254
In [997]:
# Resetting Index 
combined_w.reset_index(drop= True,inplace = True)
In [998]:
# Semi-saturated model(6 out of 9 predictors):firstly lets test the model not including all the variables so later we can see if adding a predictor would have influence on the model  

X = combined_w[ ['Job Seekers Allowance Claimants (Rate)',
       'National Insurance Number Registrations (Immigration)(Rate)',
       'Income Support Claimants (Rate)', 'Affordable Housing Supply',  'Proportion of pubs/bars','Proportion of clubs']]
X = sm.add_constant(X)
y = combined_w.iloc[:,2]

model2 = sm.OLS(y, X).fit()
print(model2.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:          Violence Rate   R-squared:                       0.492
Model:                            OLS   Adj. R-squared:                  0.485
Method:                 Least Squares   F-statistic:                     73.90
Date:                Sat, 14 Dec 2019   Prob (F-statistic):           2.96e-64
Time:                        12:00:17   Log-Likelihood:                -1373.2
No. Observations:                 465   AIC:                             2760.
Df Residuals:                     458   BIC:                             2789.
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
===============================================================================================================================
                                                                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------------------------------------------
const                                                          13.3705      1.570      8.517      0.000      10.286      16.455
Job Seekers Allowance Claimants (Rate)                         -0.5571      0.240     -2.319      0.021      -1.029      -0.085
National Insurance Number Registrations (Immigration)(Rate)     0.1050      0.012      8.785      0.000       0.082       0.129
Income Support Claimants (Rate)                                 1.0816      0.107     10.153      0.000       0.872       1.291
Affordable Housing Supply                                       0.0013      0.001      1.439      0.151      -0.000       0.003
Proportion of pubs/bars                                         0.1240      0.059      2.111      0.035       0.009       0.239
Proportion of clubs                                            -0.7537      0.164     -4.602      0.000      -1.076      -0.432
==============================================================================
Omnibus:                       11.151   Durbin-Watson:                   1.451
Prob(Omnibus):                  0.004   Jarque-Bera (JB):                9.519
Skew:                           0.277   Prob(JB):                      0.00857
Kurtosis:                       2.570   Cond. No.                     3.44e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.44e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
In [999]:
# Fully saturated model.


X = combined_w[ ['Job Seekers Allowance Claimants (Rate)',
       'National Insurance Number Registrations (Immigration)(Rate)',
       'Income Support Claimants (Rate)', 'Affordable Housing Supply',  'Proportion of pubs/bars','Proportion of clubs', 'Ratio of House Prices to Earnings',
       'Personal Insolvency (Rate)', 'Personal Bankruptcy (Rate)']]
X = sm.add_constant(X)
y = combined_w.iloc[:,2]

model3 = sm.OLS(y, X).fit()
print(model3.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:          Violence Rate   R-squared:                       0.542
Model:                            OLS   Adj. R-squared:                  0.533
Method:                 Least Squares   F-statistic:                     59.89
Date:                Sat, 14 Dec 2019   Prob (F-statistic):           1.16e-71
Time:                        12:00:19   Log-Likelihood:                -1348.9
No. Observations:                 465   AIC:                             2718.
Df Residuals:                     455   BIC:                             2759.
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
===============================================================================================================================
                                                                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------------------------------------------
const                                                           2.3631      2.244      1.053      0.293      -2.046       6.773
Job Seekers Allowance Claimants (Rate)                         -0.5488      0.259     -2.117      0.035      -1.058      -0.039
National Insurance Number Registrations (Immigration)(Rate)     0.0813      0.013      6.388      0.000       0.056       0.106
Income Support Claimants (Rate)                                 1.4156      0.120     11.821      0.000       1.180       1.651
Affordable Housing Supply                                       0.0019      0.001      2.135      0.033       0.000       0.004
Proportion of pubs/bars                                         0.2021      0.059      3.399      0.001       0.085       0.319
Proportion of clubs                                            -0.2905      0.174     -1.667      0.096      -0.633       0.052
Ratio of House Prices to Earnings                               0.5950      0.104      5.737      0.000       0.391       0.799
Personal Insolvency (Rate)                                      0.2699      0.071      3.812      0.000       0.131       0.409
Personal Bankruptcy (Rate)                                     -0.4490      0.109     -4.117      0.000      -0.663      -0.235
==============================================================================
Omnibus:                        7.552   Durbin-Watson:                   1.358
Prob(Omnibus):                  0.023   Jarque-Bera (JB):                7.006
Skew:                           0.248   Prob(JB):                       0.0301
Kurtosis:                       2.661   Cond. No.                     5.16e+03
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.16e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Observations. The removal of Westminster has improved the performance of the both semi-saturated (R-sq 49%) and saturated ((R-sq 54%) models to some extent but did not have a dramatic effect on the models in comparison to when all boroughs were present.